Message ID | 20211014142547.251144-1-aldyh@redhat.com |
---|---|
State | New |
Headers | show |
Series | Cleanup --params for backward threader. | expand |
On 10/14/2021 8:25 AM, Aldy Hernandez wrote: > The new backward threader makes some of the --param knobs used to > control it questionable at best or no longer applicable at worst. > > The fsm-maximum-phi-arguments param is unused and can be removed. > > The max-fsm-thread-length param is block based which is a bit redundant, > since we already restrict paths based on instruction estimates. > > The max-fsm-thread-paths restricts the total number of threadable paths > in a function. We probably don't need this. Besides, the forward > threader has no such restriction. > > OK pending tests? > > gcc/ChangeLog: > > * doc/invoke.texi: Remove max-fsm-thread-length, > max-fsm-thread-paths, and fsm-maximum-phi-arguments. > * params.opt: Same. > * tree-ssa-threadbackward.c (back_threader::back_threader): Remove > argument. > (back_threader_registry::back_threader_registry): Same. > (back_threader_profitability::profitable_path_p): Remove > param_max_fsm_thread-length. > (back_threader_registry::register_path): Remove > m_max_allowable_paths. OK. I don't think any of those params were addressing pathological cases and thread-length is really handled better by costing based on the # statements. jeff
Hi Aldy! On Thu, 2021-10-14 16:25:48 +0200, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > The new backward threader makes some of the --param knobs used to > control it questionable at best or no longer applicable at worst. > > The fsm-maximum-phi-arguments param is unused and can be removed. > > The max-fsm-thread-length param is block based which is a bit redundant, > since we already restrict paths based on instruction estimates. > > The max-fsm-thread-paths restricts the total number of threadable paths > in a function. We probably don't need this. Besides, the forward > threader has no such restriction. > > OK pending tests? This causes a regression for me. I'm auto-building lots of GCC cross-compilers and use these to cross-build the Linux kernel. Using binutils/gas/gcc configured for --target=sh-linux (actual configuration for GCC is this: .../gcc/configure --target=sh-linux --enable-werror-always \ --enable-languages=all --disable-gcov \ --disable-shared --disable-threads \ --without-headers \ --prefix=/var/lib/laminar/run/gcc-sh-linux/13/toolchain-install ) Then, building Linux for a good number of default configurations for ARCH=sh and ARCH=arm, GCC will just loop: $ make ARCH=sh distclean $ cp arch/sh/configs/r7780mp_defconfig .config $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- oldconfig < /dev/null $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- prepare $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- all [...] sh-linux-gcc -Wp,-MMD,drivers/ata/.libata-core.o.d -nostdinc -isystem /tmp/testbed/install/lib/gcc/sh-linux/12.0.0/include -I./arch/sh/include -I./arch/sh/include/generated -I./include -I./arch/sh/include/uapi -I./arch/sh/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -m4 -m4-nofpu -m4a -m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I ./arch/sh/include/mach-common -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 -pipe -m4 -m4-nofpu -m4a -m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I ./arch/sh/include/mach-common -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-address-of-packed-member -O2 -fno-allow-store-data-races -Wframe-larger-than=1024 -fstack-protector-strong -Wimplicit-fallthrough=5 -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang -fno-stack-clash-protection -g -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation -Wno-zero-length-bounds -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized -fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wno-packed-not-aligned -DKBUILD_MODFILE='"drivers/ata/libata"' -DKBUILD_BASENAME='"libata_core"' -DKBUILD_MODNAME='"libata"' -D__KBUILD_MODNAME=kmod_libata -c -o drivers/ata/libata-core.o drivers/ata/libata-core.c (gdb) bt #0 0x000000000100318e in vec<jump_thread_edge*, va_heap, vl_ptr>::operator[] (ix=0, this=0x50ee6e0) at ../../gcc/gcc/vec.h:1495 #1 back_jt_path_registry::adjust_paths_after_duplication (this=0x7ffdf8b6e868, curr_path_num=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2315 #2 0x0000000001003c0d in back_jt_path_registry::duplicate_thread_path (this=0x7ffdf8b6e868, entry=0x7f92651000c0, exit=<optimized out>, region=<optimized out>, n_region=8, current_path_no=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2546 #3 0x00000000010051e4 in back_jt_path_registry::update_cfg (this=0x7ffdf8b6e868) at ../../gcc/gcc/tree-ssa-threadupdate.c:2656 #4 0x0000000001003ecc in jt_path_registry::thread_through_all_blocks (this=0x7ffdf8b6e868, peel_loop_headers=<optimized out>) at ../../gcc/gcc/tree-ssa-threadupdate.c:2604 #5 0x0000000000ffb5a7 in back_threader_registry::thread_through_all_blocks (may_peel_loop_headers=true, this=0x7ffdf8b6e868) at ../../gcc/gcc/tree-ssa-threadbackward.c:556 #6 back_threader::thread_through_all_blocks (may_peel_loop_headers=true, this=0x7ffdf8b6e860) at ../../gcc/gcc/tree-ssa-threadbackward.c:501 #7 (anonymous namespace)::try_thread_blocks (fun=fun@entry=0x7f926eb389c0) at ../../gcc/gcc/tree-ssa-threadbackward.c:946 #8 0x0000000000ffb5eb in (anonymous namespace)::pass_thread_jumps::execute (this=<optimized out>, fun=0x7f926eb389c0) at ../../gcc/gcc/tree-ssa-threadbackward.c:954 #9 0x0000000000cfcdf8 in execute_one_pass (pass=0x2a980d0) at ../../gcc/gcc/passes.c:2567 #10 0x0000000000cfd770 in execute_pass_list_1 (pass=0x2a980d0) at ../../gcc/gcc/passes.c:2656 #11 0x0000000000cfd782 in execute_pass_list_1 (pass=0x2a95ab0) at ../../gcc/gcc/passes.c:2657 #12 0x0000000000cfd7a9 in execute_pass_list (fn=0x7f926eb389c0, pass=<optimized out>) at ../../gcc/gcc/passes.c:2667 #13 0x00000000009719c6 in cgraph_node::expand (this=0x7f926eb59550) at ../../gcc/gcc/context.h:48 #14 cgraph_node::expand (this=0x7f926eb59550) at ../../gcc/gcc/cgraphunit.c:1781 #15 0x0000000000972eb8 in expand_all_functions () at ../../gcc/gcc/cgraphunit.c:1992 #16 symbol_table::compile (this=0x7f9271f36000) at ../../gcc/gcc/cgraphunit.c:2356 #17 0x0000000000975948 in symbol_table::compile (this=0x7f9271f36000) at ../../gcc/gcc/cgraphunit.c:2269 #18 symbol_table::finalize_compilation_unit (this=0x7f9271f36000) at ../../gcc/gcc/cgraphunit.c:2537 #19 0x0000000000de87a6 in compile_file () at ../../gcc/gcc/toplev.c:477 #20 0x00000000007ca783 in do_compile (no_backend=false) at ../../gcc/gcc/toplev.c:2197 #21 toplev::main (this=this@entry=0x7ffdf8b6f00e, argc=<optimized out>, argc@entry=128, argv=<optimized out>, argv@entry=0x7ffdf8b6f118) at ../../gcc/gcc/toplev.c:2346 #22 0x00000000007cc74f in main (argc=128, argv=0x7ffdf8b6f118) at ../../gcc/gcc/main.c:39 Shall I open a PR for it? MfG, JBG --
That's odd. Yeah, please open a PR. Thanks. Aldy On Tue, Oct 19, 2021, 22:47 Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote: > Hi Aldy! > > On Thu, 2021-10-14 16:25:48 +0200, Aldy Hernandez via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > The new backward threader makes some of the --param knobs used to > > control it questionable at best or no longer applicable at worst. > > > > The fsm-maximum-phi-arguments param is unused and can be removed. > > > > The max-fsm-thread-length param is block based which is a bit redundant, > > since we already restrict paths based on instruction estimates. > > > > The max-fsm-thread-paths restricts the total number of threadable paths > > in a function. We probably don't need this. Besides, the forward > > threader has no such restriction. > > > > OK pending tests? > > This causes a regression for me. I'm auto-building lots of GCC > cross-compilers and use these to cross-build the Linux kernel. > > Using binutils/gas/gcc configured for --target=sh-linux (actual > configuration for GCC is this: > > .../gcc/configure --target=sh-linux --enable-werror-always \ > --enable-languages=all --disable-gcov \ > --disable-shared --disable-threads \ > --without-headers \ > > --prefix=/var/lib/laminar/run/gcc-sh-linux/13/toolchain-install > ) > > Then, building Linux for a good number of default configurations for > ARCH=sh and ARCH=arm, GCC will just loop: > > $ make ARCH=sh distclean > $ cp arch/sh/configs/r7780mp_defconfig .config > $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- oldconfig < /dev/null > $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- prepare > $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- all > [...] > sh-linux-gcc -Wp,-MMD,drivers/ata/.libata-core.o.d -nostdinc -isystem > /tmp/testbed/install/lib/gcc/sh-linux/12.0.0/include -I./arch/sh/include > -I./arch/sh/include/generated -I./include -I./arch/sh/include/uapi > -I./arch/sh/include/generated/uapi -I./include/uapi > -I./include/generated/uapi -include ./include/linux/compiler-version.h > -include ./include/linux/kconfig.h -include > ./include/linux/compiler_types.h -D__KERNEL__ -m4 -m4-nofpu -m4a -m4a-nofpu > -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I > ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I > ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I > ./arch/sh/include/mach-common -fmacro-prefix-map=./= -Wall -Wundef > -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common > -fshort-wchar -fno-PIE -Werror=implicit-function-declaration > -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 > -pipe -m4 -m4-nofpu -m4a -m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up > -ffreestanding -I ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 > -I ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I > ./arch/sh/include/mach-common -fno-delete-null-pointer-checks > -Wno-frame-address -Wno-format-truncation -Wno-format-overflow > -Wno-address-of-packed-member -O2 -fno-allow-store-data-races > -Wframe-larger-than=1024 -fstack-protector-strong -Wimplicit-fallthrough=5 > -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable > -fomit-frame-pointer -ftrivial-auto-var-init=zero > -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang > -fno-stack-clash-protection -g -Wdeclaration-after-statement -Wvla > -Wno-pointer-sign -Wno-stringop-truncation -Wno-zero-length-bounds > -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict > -Wno-maybe-uninitialized -fno-strict-overflow -fno-stack-check > -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types > -Werror=designated-init -Wno-packed-not-aligned > -DKBUILD_MODFILE='"drivers/ata/libata"' -DKBUILD_BASENAME='"libata_core"' > -DKBUILD_MODNAME='"libata"' -D__KBUILD_MODNAME=kmod_libata -c -o > drivers/ata/libata-core.o drivers/ata/libata-core.c > > > (gdb) bt > #0 0x000000000100318e in vec<jump_thread_edge*, va_heap, > vl_ptr>::operator[] (ix=0, this=0x50ee6e0) at ../../gcc/gcc/vec.h:1495 > #1 back_jt_path_registry::adjust_paths_after_duplication > (this=0x7ffdf8b6e868, curr_path_num=0) at > ../../gcc/gcc/tree-ssa-threadupdate.c:2315 > #2 0x0000000001003c0d in back_jt_path_registry::duplicate_thread_path > (this=0x7ffdf8b6e868, entry=0x7f92651000c0, exit=<optimized out>, > region=<optimized out>, n_region=8, > current_path_no=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2546 > #3 0x00000000010051e4 in back_jt_path_registry::update_cfg > (this=0x7ffdf8b6e868) at ../../gcc/gcc/tree-ssa-threadupdate.c:2656 > #4 0x0000000001003ecc in jt_path_registry::thread_through_all_blocks > (this=0x7ffdf8b6e868, peel_loop_headers=<optimized out>) at > ../../gcc/gcc/tree-ssa-threadupdate.c:2604 > #5 0x0000000000ffb5a7 in > back_threader_registry::thread_through_all_blocks > (may_peel_loop_headers=true, this=0x7ffdf8b6e868) at > ../../gcc/gcc/tree-ssa-threadbackward.c:556 > #6 back_threader::thread_through_all_blocks (may_peel_loop_headers=true, > this=0x7ffdf8b6e860) at ../../gcc/gcc/tree-ssa-threadbackward.c:501 > #7 (anonymous namespace)::try_thread_blocks (fun=fun@entry=0x7f926eb389c0) > at ../../gcc/gcc/tree-ssa-threadbackward.c:946 > #8 0x0000000000ffb5eb in (anonymous > namespace)::pass_thread_jumps::execute (this=<optimized out>, > fun=0x7f926eb389c0) at ../../gcc/gcc/tree-ssa-threadbackward.c:954 > #9 0x0000000000cfcdf8 in execute_one_pass (pass=0x2a980d0) at > ../../gcc/gcc/passes.c:2567 > #10 0x0000000000cfd770 in execute_pass_list_1 (pass=0x2a980d0) at > ../../gcc/gcc/passes.c:2656 > #11 0x0000000000cfd782 in execute_pass_list_1 (pass=0x2a95ab0) at > ../../gcc/gcc/passes.c:2657 > #12 0x0000000000cfd7a9 in execute_pass_list (fn=0x7f926eb389c0, > pass=<optimized out>) at ../../gcc/gcc/passes.c:2667 > #13 0x00000000009719c6 in cgraph_node::expand (this=0x7f926eb59550) at > ../../gcc/gcc/context.h:48 > #14 cgraph_node::expand (this=0x7f926eb59550) at > ../../gcc/gcc/cgraphunit.c:1781 > #15 0x0000000000972eb8 in expand_all_functions () at > ../../gcc/gcc/cgraphunit.c:1992 > #16 symbol_table::compile (this=0x7f9271f36000) at > ../../gcc/gcc/cgraphunit.c:2356 > #17 0x0000000000975948 in symbol_table::compile (this=0x7f9271f36000) at > ../../gcc/gcc/cgraphunit.c:2269 > #18 symbol_table::finalize_compilation_unit (this=0x7f9271f36000) at > ../../gcc/gcc/cgraphunit.c:2537 > #19 0x0000000000de87a6 in compile_file () at ../../gcc/gcc/toplev.c:477 > #20 0x00000000007ca783 in do_compile (no_backend=false) at > ../../gcc/gcc/toplev.c:2197 > #21 toplev::main (this=this@entry=0x7ffdf8b6f00e, argc=<optimized out>, > argc@entry=128, argv=<optimized out>, argv@entry=0x7ffdf8b6f118) at > ../../gcc/gcc/toplev.c:2346 > #22 0x00000000007cc74f in main (argc=128, argv=0x7ffdf8b6f118) at > ../../gcc/gcc/main.c:39 > > > Shall I open a PR for it? > > MfG, JBG > > -- >
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 03234c887dc..69993270b39 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14464,14 +14464,6 @@ Emit instrumentation calls to __tsan_func_entry() and __tsan_func_exit(). Maximum number of instructions to copy when duplicating blocks on a finite state automaton jump thread path. -@item max-fsm-thread-length -Maximum number of basic blocks on a finite state automaton jump thread -path. - -@item max-fsm-thread-paths -Maximum number of new jump thread paths to create for a finite state -automaton. - @item parloops-chunk-size Chunk size of omp schedule for loops parallelized by parloops. @@ -14626,10 +14618,6 @@ The maximum depth of recursive inlining for non-inline functions. Scale factor to apply to the number of statements in a threading path when comparing to the number of (scaled) blocks. -@item fsm-maximum-phi-arguments -Maximum number of arguments a PHI may have before the FSM threader -will not try to thread through its block. - @item uninit-control-dep-attempts Maximum number of nested calls to search for control dependencies during uninitialized variable analysis. diff --git a/gcc/params.opt b/gcc/params.opt index 84d642d72c5..06a6fdc9deb 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -173,10 +173,6 @@ Common Joined UInteger Var(param_ranger_logical_depth) Init(6) IntegerRange(1, 9 Maximum depth of logical expression evaluation ranger will look through when evaluating outgoing edge ranges. --param=fsm-maximum-phi-arguments= -Common Joined UInteger Var(param_fsm_maximum_phi_arguments) Init(100) IntegerRange(1, 999999) Param Optimization -Maximum number of arguments a PHI may have before the FSM threader will not try to thread through its block. - -param=fsm-scale-path-blocks= Common Joined UInteger Var(param_fsm_scale_path_blocks) Init(3) IntegerRange(1, 10) Param Optimization Scale factor to apply to the number of blocks in a threading path when comparing to the number of (scaled) statements. @@ -537,18 +533,10 @@ The maximum number of nested indirect inlining performed by early inliner. Common Joined UInteger Var(param_max_fields_for_field_sensitive) Param Maximum number of fields in a structure before pointer analysis treats the structure as a single variable. --param=max-fsm-thread-length= -Common Joined UInteger Var(param_max_fsm_thread_length) Init(10) IntegerRange(1, 999999) Param Optimization -Maximum number of basic blocks on a finite state automaton jump thread path. - -param=max-fsm-thread-path-insns= Common Joined UInteger Var(param_max_fsm_thread_path_insns) Init(100) IntegerRange(1, 999999) Param Optimization Maximum number of instructions to copy when duplicating blocks on a finite state automaton jump thread path. --param=max-fsm-thread-paths= -Common Joined UInteger Var(param_max_fsm_thread_paths) Init(50) IntegerRange(1, 999999) Param Optimization -Maximum number of new jump thread paths to create for a finite state automaton. - -param=max-gcse-insertion-ratio= Common Joined UInteger Var(param_max_gcse_insertion_ratio) Init(20) Param Optimization The maximum ratio of insertions to deletions of expressions in GCSE. diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 1999ccf4834..7c2c1112bee 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -52,12 +52,11 @@ along with GCC; see the file COPYING3. If not see class back_threader_registry { public: - back_threader_registry (int max_allowable_paths); + back_threader_registry (); bool register_path (const vec<basic_block> &, edge taken); bool thread_through_all_blocks (bool may_peel_loop_headers); private: back_jt_path_registry m_lowlevel_registry; - const int m_max_allowable_paths; int m_threaded_paths; }; @@ -120,8 +119,7 @@ private: const edge back_threader::UNREACHABLE_EDGE = (edge) -1; back_threader::back_threader (bool speed_p) - : m_registry (param_max_fsm_thread_paths), - m_profit (speed_p), + : m_profit (speed_p), m_solver (m_ranger, /*resolve=*/false) { m_last_stmt = NULL; @@ -547,8 +545,7 @@ back_threader::debug () dump (stderr); } -back_threader_registry::back_threader_registry (int max_allowable_paths) - : m_max_allowable_paths (max_allowable_paths) +back_threader_registry::back_threader_registry () { m_threaded_paths = 0; } @@ -594,15 +591,6 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, if (m_path.length () <= 1) return false; - if (m_path.length () > (unsigned) param_max_fsm_thread_length) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, " FAIL: Jump-thread path not considered: " - "the number of basic blocks on the path " - "exceeds PARAM_MAX_FSM_THREAD_LENGTH.\n"); - return false; - } - int n_insns = 0; gimple_stmt_iterator gsi; loop_p loop = m_path[0]->loop_father; @@ -885,15 +873,6 @@ bool back_threader_registry::register_path (const vec<basic_block> &m_path, edge taken_edge) { - if (m_threaded_paths > m_max_allowable_paths) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, " FAIL: Jump-thread path not considered: " - "the number of previously recorded paths to " - "thread exceeds PARAM_MAX_FSM_THREAD_PATHS.\n"); - return false; - } - vec<jump_thread_edge *> *jump_thread_path = m_lowlevel_registry.allocate_thread_path ();