Message ID | 20190916163914.macczibmd54rjqri@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | -O2 inliner returning 1/n: reduce EARLY_INLINING_INSNS for O1 and O2 | expand |
> * ipa-inline.c (want_early_inline_function_p): Use > PARAM_EARLY_INLINING_INSNS_O2. > * params.def (PARAM_EARLY_INLINING_INSNS_O2): New. > (PARAM_EARLY_INLINING_INSNS): Update documentation. > * invoke.texi (early-inlining-insns-O2): New. > (early-inlining-insns): Update documentation. Hi, this is a variant of patch with testsuite compensation I comitted today. There are some cases where we need early inlining to happen in order to get the even we look for. In most cases this will go back once late inlining autoinline, but I filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91954 for missed vectorization issue and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91955 for Wstringop. Honza * doc/invoke.texi (early-inlining-insns-O2): Document. (early-inlining-insns): Update. * params.def (early-inlining-insns-O2): New bound. (early-inlining-insns): Update docs. * ipa-inline.c (want_early_inline_function_p): Use new bound. * g++.dg/tree-ssa/pr61034.C: Set early-inlining-insns-O2=14. * g++.dg/tree-ssa/pr8781.C: Likewise. * g++.dg/warn/Wstringop-truncation-1.C: Likewise. * gcc.dg/ipa/pr63416.c: likewise. * gcc.dg/vect/pr66142.c: Likewise. * gcc.dg/tree-ssa/ssa-thread-12.c: Mark compure_idf inline. Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 276272) +++ doc/invoke.texi (working copy) @@ -11291,9 +11291,17 @@ recursion depth can be guessed from the via a given call expression. This parameter limits inlining only to call expressions whose probability exceeds the given threshold (in percents). +@item early-inlining-insns-O2 +Specify growth that the early inliner can make. In effect it increases +the amount of inlining for code having a large abstraction penalty. +This is applied to functions compiled with @option{-O1} or @option{-O2} +optimization levels. + @item early-inlining-insns Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty. +This is applied to functions compiled with @option{-O3} or @option{-Ofast} +optimization levels. @item max-early-inliner-iterations Limit of iterations of the early inliner. This basically bounds Index: ipa-inline.c =================================================================== --- ipa-inline.c (revision 276272) +++ ipa-inline.c (working copy) @@ -641,6 +641,10 @@ want_early_inline_function_p (struct cgr { int growth = estimate_edge_growth (e); int n; + int early_inlining_insns = opt_for_fn (e->caller->decl, optimize) >= 3 + ? PARAM_VALUE (PARAM_EARLY_INLINING_INSNS) + : PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_O2); + if (growth <= PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE)) ; @@ -654,26 +658,28 @@ want_early_inline_function_p (struct cgr growth); want_inline = false; } - else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)) + else if (growth > early_inlining_insns) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt, " will not early inline: %C->%C, " - "growth %i exceeds --param early-inlining-insns\n", - e->caller, callee, - growth); + "growth %i exceeds --param early-inlining-insns%s\n", + e->caller, callee, growth, + opt_for_fn (e->caller->decl, optimize) >= 3 + ? "" : "-O2"); want_inline = false; } else if ((n = num_calls (callee)) != 0 - && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)) + && growth * (n + 1) > early_inlining_insns) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt, " will not early inline: %C->%C, " - "growth %i exceeds --param early-inlining-insns " + "growth %i exceeds --param early-inlining-insns%s " "divided by number of calls\n", - e->caller, callee, - growth); + e->caller, callee, growth, + opt_for_fn (e->caller->decl, optimize) >= 3 + ? "" : "-O2"); want_inline = false; } } Index: params.def =================================================================== --- params.def (revision 276272) +++ params.def (working copy) @@ -233,8 +233,12 @@ DEFPARAM(PARAM_IPCP_UNIT_GROWTH, 10, 0, 0) DEFPARAM(PARAM_EARLY_INLINING_INSNS, "early-inlining-insns", - "Maximal estimated growth of function body caused by early inlining of single call.", + "Maximal estimated growth of function body caused by early inlining of single call with -O3 and -Ofast.", 14, 0, 0) +DEFPARAM(PARAM_EARLY_INLINING_INSNS_O2, + "early-inlining-insns-O2", + "Maximal estimated growth of function body caused by early inlining of single call with -O1 and -O2.", + 6, 0, 0) DEFPARAM(PARAM_LARGE_STACK_FRAME, "large-stack-frame", "The size of stack frame to be considered large.", Index: testsuite/g++.dg/tree-ssa/pr61034.C =================================================================== --- testsuite/g++.dg/tree-ssa/pr61034.C (revision 276272) +++ testsuite/g++.dg/tree-ssa/pr61034.C (working copy) @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-O2 -fdump-tree-fre3 -fdump-tree-optimized -fdelete-null-pointer-checks" } +// { dg-options "-O2 -fdump-tree-fre3 -fdump-tree-optimized -fdelete-null-pointer-checks --param early-inlining-insns-O2=14" } #define assume(x) if(!(x))__builtin_unreachable() Index: testsuite/g++.dg/tree-ssa/pr8781.C =================================================================== --- testsuite/g++.dg/tree-ssa/pr8781.C (revision 276272) +++ testsuite/g++.dg/tree-ssa/pr8781.C (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O -fno-tree-sra -fdump-tree-fre1" } */ +/* { dg-options "-O -fno-tree-sra -fdump-tree-fre1 --param early-inlining-insns-O2=14" } */ int f(); Index: testsuite/g++.dg/warn/Wstringop-truncation-1.C =================================================================== --- testsuite/g++.dg/warn/Wstringop-truncation-1.C (revision 276272) +++ testsuite/g++.dg/warn/Wstringop-truncation-1.C (working copy) @@ -1,7 +1,7 @@ /* PR/tree-optimization/84480 - bogus -Wstringop-truncation despite assignment with an inlined string literal { dg-do compile } - { dg-options "-O2 -Wstringop-truncation" } */ + { dg-options "-O2 -Wstringop-truncation --param early-inlining-insns-O2=14" } */ #include <string.h> Index: testsuite/gcc.dg/ipa/pr63416.c =================================================================== --- testsuite/gcc.dg/ipa/pr63416.c (revision 276272) +++ testsuite/gcc.dg/ipa/pr63416.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-options "-O2 -fdump-tree-optimized --param early-inlining-insns-O2=14" } */ #define _UNUSED_ __attribute__((__unused__)) typedef int TEST_F30 (int *v); Index: testsuite/gcc.dg/tree-ssa/ssa-thread-12.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (revision 276272) +++ testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (working copy) @@ -56,7 +56,7 @@ bmp_iter_and_compl (bitmap_iterator * bi } extern int VEC_int_base_length (VEC_int_base *); -bitmap +inline bitmap compute_idf (bitmap def_blocks, bitmap_head * dfs) { bitmap_iterator bi; Index: testsuite/gcc.dg/vect/pr66142.c =================================================================== --- testsuite/gcc.dg/vect/pr66142.c (revision 276272) +++ testsuite/gcc.dg/vect/pr66142.c (working copy) @@ -1,6 +1,6 @@ /* PR middle-end/66142 */ /* { dg-do compile } */ -/* { dg-additional-options "-ffast-math -fopenmp-simd" } */ +/* { dg-additional-options "-ffast-math -fopenmp-simd --param early-inlining-insns-O2=14" } */ /* { dg-additional-options "-mavx" { target avx_runtime } } */ struct A { float x, y; };
On Tue, Oct 01, 2019 at 07:04:53PM +0200, Jan Hubicka wrote: > * gcc.dg/tree-ssa/ssa-thread-12.c: Mark compure_idf inline. > --- testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (revision 276272) > +++ testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (working copy) > @@ -56,7 +56,7 @@ bmp_iter_and_compl (bitmap_iterator * bi > } > > extern int VEC_int_base_length (VEC_int_base *); > -bitmap > +inline bitmap > compute_idf (bitmap def_blocks, bitmap_head * dfs) > { > bitmap_iterator bi; Neither this change nor the followup in http://gcc.gnu.org/r276418 actually works. PASS: gcc.dg/tree-ssa/ssa-thread-12.c (test for excess errors) gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread2 "FSM" gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread3 "FSM" gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread4 "FSM" We shouldn't really have UNRESOLVED testcases in the testsuite. The reason is that with the static __inline__ bitmap in there, no function is actually emitted as nothing uses it, so there are no dumps other than original (and 4 empty ones) with -fdump-tree-all. Jakub
> On Tue, Oct 01, 2019 at 07:04:53PM +0200, Jan Hubicka wrote: > > * gcc.dg/tree-ssa/ssa-thread-12.c: Mark compure_idf inline. > > > --- testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (revision 276272) > > +++ testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (working copy) > > @@ -56,7 +56,7 @@ bmp_iter_and_compl (bitmap_iterator * bi > > } > > > > extern int VEC_int_base_length (VEC_int_base *); > > -bitmap > > +inline bitmap > > compute_idf (bitmap def_blocks, bitmap_head * dfs) > > { > > bitmap_iterator bi; > > Neither this change nor the followup in http://gcc.gnu.org/r276418 > actually works. > PASS: gcc.dg/tree-ssa/ssa-thread-12.c (test for excess errors) > gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist > UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread2 "FSM" > gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist > UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread3 "FSM" > gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist > UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread4 "FSM" > > We shouldn't really have UNRESOLVED testcases in the testsuite. > > The reason is that with the static __inline__ bitmap in there, no function > is actually emitted as nothing uses it, so there are no dumps other than > original (and 4 empty ones) with -fdump-tree-all. Aha, sorry for that. I was looking for FAILs and missed the UNRESOLVED. I will revert that change and go with --param then. The testcase is somewhat fragile depending on particular chain of inliner decisions. Honza > > Jakub
Index: ipa-inline.c =================================================================== --- ipa-inline.c (revision 275716) +++ ipa-inline.c (working copy) @@ -641,6 +641,10 @@ want_early_inline_function_p (struct cgr { int growth = estimate_edge_growth (e); int n; + int early_inlining_insns = opt_for_fn (e->caller->decl, optimize) >= 3 + ? PARAM_VALUE (PARAM_EARLY_INLINING_INSNS) + : PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_O2); + if (growth <= PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE)) ; @@ -654,26 +658,28 @@ want_early_inline_function_p (struct cgr growth); want_inline = false; } - else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)) + else if (growth > early_inlining_insns) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt, " will not early inline: %C->%C, " - "growth %i exceeds --param early-inlining-insns\n", - e->caller, callee, - growth); + "growth %i exceeds --param early-inlining-insns%s\n", + e->caller, callee, growth, + opt_for_fn (e->caller->decl, optimize) >= 3 + ? "" : "-O2"); want_inline = false; } else if ((n = num_calls (callee)) != 0 - && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)) + && growth * (n + 1) > early_inlining_insns) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt, " will not early inline: %C->%C, " - "growth %i exceeds --param early-inlining-insns " + "growth %i exceeds --param early-inlining-insns%s " "divided by number of calls\n", - e->caller, callee, - growth); + e->caller, callee, growth, + opt_for_fn (e->caller->decl, optimize) >= 3 + ? "" : "-O2"); want_inline = false; } } Index: params.def =================================================================== --- params.def (revision 275716) +++ params.def (working copy) @@ -233,8 +233,12 @@ DEFPARAM(PARAM_IPCP_UNIT_GROWTH, 10, 0, 0) DEFPARAM(PARAM_EARLY_INLINING_INSNS, "early-inlining-insns", - "Maximal estimated growth of function body caused by early inlining of single call.", + "Maximal estimated growth of function body caused by early inlining of single call with -O3 and -Ofast.", 14, 0, 0) +DEFPARAM(PARAM_EARLY_INLINING_INSNS_O2, + "early-inlining-insns-O2", + "Maximal estimated growth of function body caused by early inlining of single call with -O1 and -O2.", + 6, 0, 0) DEFPARAM(PARAM_LARGE_STACK_FRAME, "large-stack-frame", "The size of stack frame to be considered large.", Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 275716) +++ doc/invoke.texi (working copy) @@ -11290,9 +11290,17 @@ recursion depth can be guessed from the via a given call expression. This parameter limits inlining only to call expressions whose probability exceeds the given threshold (in percents). +@item early-inlining-insns-O2 +Specify growth that the early inliner can make. In effect it increases +the amount of inlining for code having a large abstraction penalty. +This is applied to functions compiled with @option{-O1} or @option{-O2} +optimization levels. + @item early-inlining-insns Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty. +This is applied to functions compiled with @option{-O3} or @option{-Ofast} +optimization levels. @item max-early-inliner-iterations Limit of iterations of the early inliner. This basically bounds