Message ID | CAFiYyc1AHg7u0VA2XiZ50ozY6=BaPQx7Dct9-7M1rUzdgNj9Lw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 28/10/2011, at 11:43 PM, Richard Guenther wrote: > On Fri, Oct 28, 2011 at 1:05 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote: >> >> OK for trunk? > > diff --git a/gcc/cgraph.c b/gcc/cgraph.c > index f056d3d..4738b28 100644 > --- a/gcc/cgraph.c > +++ b/gcc/cgraph.c > @@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered) > tree_lowering_passes (fndecl); > bitmap_obstack_initialize (NULL); > if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl))) > - execute_pass_list (pass_early_local_passes.pass.sub); > + execute_early_local_passes_for_current_function (); > bitmap_obstack_release (NULL); > pop_cfun (); > current_function_decl = NULL; > @@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered) > gimple_register_cfg_hooks (); > bitmap_obstack_initialize (NULL); > if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl))) > - execute_pass_list (pass_early_local_passes.pass.sub); > + execute_early_local_passes_for_current_function (); > > I think these should only execute the lowering pieces of early local passes, > let me see if that's properly split ... > > @@ -255,7 +255,7 @@ cgraph_process_new_functions (void) > /* When not optimizing, be sure we run early local passes anyway > to expand OMP. */ > || !optimize) > - execute_pass_list (pass_early_local_passes.pass.sub); > + execute_early_local_passes_for_current_function (); > > similar. In the case of tree_lowering_passes, I agree. But with the calls to early_local_passes from cgraph_add_new_function/cgraph_process_new_functions we ought to run the full list of optimizations to make bodies of new functions ready for inlining to existing functions. Note, cgraph_add_new_function() explicitly calls tree_lowering_passes and then invokes early_local_passes. What do you think? If we want to pursue this, it should be in a separate patch anyway. > > About all this -suffix stuff, I'd like to have the iterations simply re-use > the existing dump-files, thus statically sub-divide pass_early_local_passes > like I considered this when I was implementing and debugging iterative support, and it is better to keep dumps for main and iterative parts separate. These are two different IPA passes, and their sub-passes should use separate dumps. Given that the set of sub-passes of early_local_passes_main is substantially different from the set of sub-passes of early_local_passes_iter (the former has a bunch of init/expand/lower passes in it), mixing the dumps will be mighty confusing. Without knowing the exact ordering of passes one will not be able to construct a chronological picture of optimizations. Separating dumps of sub-passes of different IPA passes simplifies understanding of optimization ordering. > > NEXT_PASS (pass_early_local_lowering_passes); > { > NEXT_PASS (pass_fixup_cfg); > NEXT_PASS (pass_init_datastructures); > NEXT_PASS (pass_expand_omp); > > NEXT_PASS (pass_referenced_vars); > NEXT_PASS (pass_build_ssa); > NEXT_PASS (pass_lower_vector); > NEXT_PASS (pass_early_warn_uninitialized); > } > /* The following you maybe iterate. */ > NEXT_PASS (pass_early_local_optimizations); > { > NEXT_PASS (pass_rebuild_cgraph_edges); > NEXT_PASS (pass_inline_parameters); > NEXT_PASS (pass_early_inline); > NEXT_PASS (pass_all_early_optimizations); > { > struct opt_pass **p = &pass_all_early_optimizations.pass.sub; > NEXT_PASS (pass_remove_cgraph_callee_edges); > NEXT_PASS (pass_rename_ssa_copies); > NEXT_PASS (pass_ccp); > NEXT_PASS (pass_forwprop); > /* pass_build_ealias is a dummy pass that ensures that we > execute TODO_rebuild_alias at this point. Re-building > alias information also rewrites no longer addressed > locals into SSA form if possible. */ > NEXT_PASS (pass_build_ealias); > NEXT_PASS (pass_sra_early); > NEXT_PASS (pass_fre); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_merge_phi); > NEXT_PASS (pass_cd_dce); > NEXT_PASS (pass_early_ipa_sra); > NEXT_PASS (pass_tail_recursion); > NEXT_PASS (pass_convert_switch); > NEXT_PASS (pass_cleanup_eh); > NEXT_PASS (pass_profile); > NEXT_PASS (pass_local_pure_const); > } > } > NEXT_PASS (pass_late_early_local_optimizations) > { > /* Split functions creates parts that are not run through > early optimizations again. It is thus good idea to do this > late. */ > NEXT_PASS (pass_split_functions); > NEXT_PASS (pass_release_ssa_names); > NEXT_PASS (pass_rebuild_cgraph_edges); > NEXT_PASS (pass_inline_parameters); > } This simple division into 3 stages will not work. On one hand, one needs to add fixup/cleanup passes at the beginning and end of every SIMPLE_IPA pass; without those cgraph_verify() gets upset very easily. On the other hand, when iterative optimizations are not enabled, and there is no need to decouple main and late parts, these additional fixup/cleanup passes would be unnecessary overhead. Also, in the above division you effectively make 3 separate SIMPLE_IPA passes, and I don't know what the effect of that will be. The pass formation and scheduling in the patch I posted may not look as pretty as what you describe above, but it is reasonably clean and easy to maintain. One needs to change only passes in the main part, and those changes will be automatically picked up in iterative and late parts. > > +++ b/gcc/toplev.c > @@ -1228,7 +1228,6 @@ general_init (const char *argv0) > /* This must be done after global_init_params but before argument > processing. */ > init_ggc_heuristics(); > - init_optimization_passes (); > statistics_early_init (); > finish_params (); > } > @@ -1989,6 +1988,8 @@ toplev_main (int argc, char **argv) > save_decoded_options, save_decoded_options_count, > UNKNOWN_LOCATION, global_dc); > > + init_optimization_passes (); > + > handle_common_deferred_options (); > > init_local_tick (); > > any reason for this? Yes, this moves init_optimization_passes() after processing of --param options. This is needed to be able to use PARAM_VALUE in passes initialization. > > + /* Don't recurse or wonder on to the next pass when running > + execute_ipa_pass_list below. */ > + current->execute = NULL; > + current->next = NULL; > > that looks awkward ;) Shouldn't instead > > + execute_ipa_pass_list (current); > > not do what it does? Thus, split that into two pieces, one that we > can iterate w/o the fiddling with current->? The choice is between the above and duplicating post-processing logic of execute_ipa_pass_list (i.e., call to cgraph_process_new_functions) in the iteration loop. I agree that the above is a tad awkward, but it is cleaner than copy-paste of the final part of execute_ipa_pass_list. > > I like this variant a lot better than the last one - still it lacks any > analysis-based justification for iteration (see my reply to Matt on > what I discussed with Honza). Yes, having a way to tell whether a function have significantly changed would be awesome. My approach here would be to make inline_parameters output feedback of how much the size/time metrics have changed for a function since previous run. If the change is above X%, then queue functions callers for more optimizations. Similarly, Martin's rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue new direct callees and current function for another iteration if new direct edges were resolved. But such analysis is way out of scope of this patch, which adds infrastructure for iterative optimizations to be possible and reliable. > Thus, I don't think we want to > merge this in its current form or in this stage1. What is the benefit of pushing this to a later release? If anything, merging the support for iterative optimizations now will allow us to consider adding the wonderful smartness to it later. In the meantime, substituting that smartness with a knob is still a great alternative. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
On Sat, 29 Oct 2011, Maxim Kuvyrkov wrote: >> I like this variant a lot better than the last one - still it lacks any >> analysis-based justification for iteration (see my reply to Matt on >> what I discussed with Honza). > > Yes, having a way to tell whether a function have significantly changed > would be awesome. My approach here would be to make inline_parameters > output feedback of how much the size/time metrics have changed for a > function since previous run. If the change is above X%, then queue > functions callers for more optimizations. Similarly, Martin's > rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue > new direct callees and current function for another iteration if new > direct edges were resolved. Figuring out the heuristic will need decent testing on a few projects to figure out what the "sweet spot" is (smallest binary for time/passes spent) for that given codebase. With a few data points, a reasonable stab at the metrics you mention can be had that would not terminate the iterations before the known optimial number of passes. Without those data points, it seems like making sure the metrics allow those "sweet spots" to be attained will be difficult. >> Thus, I don't think we want to >> merge this in its current form or in this stage1. > > What is the benefit of pushing this to a later release? If anything, > merging the support for iterative optimizations now will allow us to > consider adding the wonderful smartness to it later. In the meantime, > substituting that smartness with a knob is still a great alternative. I agree (of course). Having the knob will be very useful for testing and determining the acceptance criteria for the later "smartness". While terminating early would be a nice optimization, the feature is still intrinsically useful and deployable without it. In addition, when using LTO on nearly all the projects/modules I tested on, 3+ passes were always productive. To be fair, when not using LTO, beyond 2-3 passes did not often produce improvements unless individual compilation units were enormous. There was also the question of if some of the improvements seen with multiple passes were indicative of deficiencies in early inlining, CFG, SRA, etc. If the knob is available, I'm happy to continue testing on the same projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl, scummvm, binutils, etc) and write a report on what I observe as "suspicious" improvements that perhaps should be caught/made in a single pass. It's worth noting again that while this is a useful feature in and of itself (especially when combined with LTO), it's *extremely* useful when coupled with the de-virtualization improvements submitted in other threads. The examples submitted for inclusion in the test suite aren't academic -- they are reductions of real-world performance issues from a mature (and shipping) C++-based networking product. Any C++ codebase that employs physical separation in their designs via Factory patterns, Interface Segregation, and/or Dependency Inversion will likely see improvements. To me, these enahncements combine to form one of the biggest leaps I've seen in C++ code optimization -- code that can be clean, OO, *and* fast. Richard: If there's any additional testing or information I can reasonably provide to help get this in for this stage1, let me know. Thanks! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
Hi, On Fri, Oct 28, 2011 at 04:06:20PM -0700, Matt wrote: > ... > > I agree (of course). Having the knob will be very useful for testing > and determining the acceptance criteria for the later "smartness". > While terminating early would be a nice optimization, the feature is > still intrinsically useful and deployable without it. In addition, > when using LTO on nearly all the projects/modules I tested on, 3+ > passes were always productive. To be fair, when not using LTO, > beyond 2-3 passes did not often produce improvements unless > individual compilation units were enormous. I'm quite surprised you get extra benefit with LTO since early optimizations are exactly the part of middle-end which should produce the same results, LTO or not. So the only way I can imagine this can happen is that inlining analysis gets somehow a much better input and then can make much bigger use of it. If this is because of extra early inlining, we might try to be able to catch these situations when doing IPA inlining decisions which would work regardless of any iteration number cut-off. If it is because of something else, it's probably better to (at least try to) tweak the passes and/or inlining analysis to understand each other straight away. > > There was also the question of if some of the improvements seen with > multiple passes were indicative of deficiencies in early inlining, > CFG, SRA, SRA, because it is not flow-sensitive in any way, unfortunately sometimes produces useless statements which then need to be cleanup up by forwprop (and possibly dse and others). We've already talked with Richi about this and agreed the early one should be dumbed down a little to produce much less of these. I'm afraid I won't be able to submit a patch doing that during this stage 1, though. > etc. If the knob is available, I'm happy to continue > testing on the same projects I've filed recent LTO/graphite bugs > against (glib, zlib, openssl, scummvm, binutils, etc) and write a > report on what I observe as "suspicious" improvements that perhaps > should be caught/made in a single pass. > > It's worth noting again that while this is a useful feature in and > of itself (especially when combined with LTO), it's *extremely* > useful when coupled with the de-virtualization improvements > submitted in other threads. The examples submitted for inclusion in > the test suite aren't academic -- they are reductions of real-world > performance issues from a mature (and shipping) C++-based networking > product. Any C++ codebase that employs physical separation in their > designs via Factory patterns, Interface Segregation, and/or > Dependency Inversion will likely see improvements. To me, these > enahncements combine to form one of the biggest leaps I've seen in > C++ code optimization -- code that can be clean, OO, *and* fast. Well, while I'd understand that whenever there is a new direct call graph edge, trying early inlining again might help or save some work for the full inlining, I think that we should rather try to enhance the current IPA infrastructure rather than grow another one from the early optimizations, especially if we aim at LTO - iterating early optimizations will not help reduce abstraction if it is spread across a number of compilation units. Martin
On Sat, Oct 29, 2011 at 1:06 AM, Matt <matt@use.net> wrote: > On Sat, 29 Oct 2011, Maxim Kuvyrkov wrote: > >>> I like this variant a lot better than the last one - still it lacks any >>> analysis-based justification for iteration (see my reply to Matt on >>> what I discussed with Honza). >> >> Yes, having a way to tell whether a function have significantly changed >> would be awesome. My approach here would be to make inline_parameters >> output feedback of how much the size/time metrics have changed for a >> function since previous run. If the change is above X%, then queue >> functions callers for more optimizations. Similarly, Martin's >> rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue new >> direct callees and current function for another iteration if new direct >> edges were resolved. > > Figuring out the heuristic will need decent testing on a few projects to > figure out what the "sweet spot" is (smallest binary for time/passes spent) > for that given codebase. With a few data points, a reasonable stab at the > metrics you mention can be had that would not terminate the iterations > before the known optimial number of passes. Without those data points, it > seems like making sure the metrics allow those "sweet spots" to be attained > will be difficult. Well, sure - the same like with inlining heuristics. >>> Thus, I don't think we want to >>> merge this in its current form or in this stage1. >> >> What is the benefit of pushing this to a later release? If anything, >> merging the support for iterative optimizations now will allow us to >> consider adding the wonderful smartness to it later. In the meantime, >> substituting that smartness with a knob is still a great alternative. The benefit? The benifit is to not have another magic knob in there that doesn't make too much sense and papers over real conceptual/algorithmic issues. Brute-force iterating is a hack, not a solution. (sorry) > I agree (of course). Having the knob will be very useful for testing and > determining the acceptance criteria for the later "smartness". While > terminating early would be a nice optimization, the feature is still > intrinsically useful and deployable without it. In addition, when using LTO > on nearly all the projects/modules I tested on, 3+ passes were always > productive. If that is true then I'd really like to see testcases. Because I am sure you are just papering over (mabe even easy to fix) issues by the brute-force iterating approach. We also do not have a switch to run every pass twice in succession, just because that would be as stupid as this. > To be fair, when not using LTO, beyond 2-3 passes did not often > produce improvements unless individual compilation units were enormous. > > There was also the question of if some of the improvements seen with > multiple passes were indicative of deficiencies in early inlining, CFG, SRA, > etc. If the knob is available, I'm happy to continue testing on the same > projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl, > scummvm, binutils, etc) and write a report on what I observe as "suspicious" > improvements that perhaps should be caught/made in a single pass. > > It's worth noting again that while this is a useful feature in and of itself > (especially when combined with LTO), it's *extremely* useful when coupled > with the de-virtualization improvements submitted in other threads. The > examples submitted for inclusion in the test suite aren't academic -- they > are reductions of real-world performance issues from a mature (and shipping) > C++-based networking product. Any C++ codebase that employs physical > separation in their designs via Factory patterns, Interface Segregation, > and/or Dependency Inversion will likely see improvements. To me, these > enahncements combine to form one of the biggest leaps I've seen in C++ code > optimization -- code that can be clean, OO, *and* fast. But iterating the whole early optimization pipeline is not a sensible approach of attacking these. Richard. > Richard: If there's any additional testing or information I can reasonably > provide to help get this in for this stage1, let me know. > > Thanks! > > > -- > tangled strands of DNA explain the way that I behave. > http://www.clock.org/~matt >
On 2/11/2011, at 10:18 AM, Richard Guenther wrote: > >>>> Thus, I don't think we want to >>>> merge this in its current form or in this stage1. >>> >>> What is the benefit of pushing this to a later release? If anything, >>> merging the support for iterative optimizations now will allow us to >>> consider adding the wonderful smartness to it later. In the meantime, >>> substituting that smartness with a knob is still a great alternative. > > The benefit? The benifit is to not have another magic knob in there > that doesn't make too much sense and papers over real conceptual/algorithmic > issues. Brute-force iterating is a hack, not a solution. (sorry) > >> I agree (of course). Having the knob will be very useful for testing and >> determining the acceptance criteria for the later "smartness". While >> terminating early would be a nice optimization, the feature is still >> intrinsically useful and deployable without it. In addition, when using LTO >> on nearly all the projects/modules I tested on, 3+ passes were always >> productive. > > If that is true then I'd really like to see testcases. Because I am sure you > are just papering over (mabe even easy to fix) issues by the brute-force > iterating approach. We also do not have a switch to run every pass twice > in succession, just because that would be as stupid as this. > >> To be fair, when not using LTO, beyond 2-3 passes did not often >> produce improvements unless individual compilation units were enormous. >> >> There was also the question of if some of the improvements seen with >> multiple passes were indicative of deficiencies in early inlining, CFG, SRA, >> etc. If the knob is available, I'm happy to continue testing on the same >> projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl, >> scummvm, binutils, etc) and write a report on what I observe as "suspicious" >> improvements that perhaps should be caught/made in a single pass. >> >> It's worth noting again that while this is a useful feature in and of itself >> (especially when combined with LTO), it's *extremely* useful when coupled >> with the de-virtualization improvements submitted in other threads. The >> examples submitted for inclusion in the test suite aren't academic -- they >> are reductions of real-world performance issues from a mature (and shipping) >> C++-based networking product. Any C++ codebase that employs physical >> separation in their designs via Factory patterns, Interface Segregation, >> and/or Dependency Inversion will likely see improvements. To me, these >> enahncements combine to form one of the biggest leaps I've seen in C++ code >> optimization -- code that can be clean, OO, *and* fast. > > But iterating the whole early optimization pipeline is not a sensible approach > of attacking these. Richard, You are making valid points. Ideally, we would have a compiler with optimal pass pipeline that would not generate any better or worse code with second and subsequent iterations. I disagree, however, that the patch discussed in this thread is a hack. It is a feature. It allows developers and power-users to experiment with the pass pipeline and see what comes out of it. The benchmarks for the 2nd version of the patch have finally finished [*]. It shows that the additional iterations do make a difference of 2% for a number of benchmarks. There are also trends like "additional iterations improve -m64 and hurt -m32" -- 254.gap, 256.bzip2, 300.twolf, 171.swim, 173.applu -- and "additional iterations improve -O2 and hurt -O3" -- 179.art, 400.perlbench. There also are enough benchmarks that show straight improvement or straight regression. How are we going to investigate and fix these missed optimization opportunities and regressions without the feature that we are discussing here? We already provide certain features to enable users tinker with optimization pipeline: 1. -O0/-O1/-O2/-O3 options for entry users, 2. -f<pass>/-fno-<pass> for advanced users, 3. --param <obscure>=MAGIC for developers and compiler-savvy users. Iterative optimizations should be no different. "Yes", without this feature the world would be a simpler place for us to live in, as we would not have to know and care about regressions and missed opportunities in GCC's optimizations. But, really, that would've been a boring world :-). I feel that dropping this feature will cause the problems to stay unnoticed and never get fixed. [*] https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFE Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
On Tue, Nov 8, 2011 at 7:34 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote: > On 2/11/2011, at 10:18 AM, Richard Guenther wrote: >> >>>>> Thus, I don't think we want to >>>>> merge this in its current form or in this stage1. >>>> >>>> What is the benefit of pushing this to a later release? If anything, >>>> merging the support for iterative optimizations now will allow us to >>>> consider adding the wonderful smartness to it later. In the meantime, >>>> substituting that smartness with a knob is still a great alternative. >> >> The benefit? The benifit is to not have another magic knob in there >> that doesn't make too much sense and papers over real conceptual/algorithmic >> issues. Brute-force iterating is a hack, not a solution. (sorry) >> >>> I agree (of course). Having the knob will be very useful for testing and >>> determining the acceptance criteria for the later "smartness". While >>> terminating early would be a nice optimization, the feature is still >>> intrinsically useful and deployable without it. In addition, when using LTO >>> on nearly all the projects/modules I tested on, 3+ passes were always >>> productive. >> >> If that is true then I'd really like to see testcases. Because I am sure you >> are just papering over (mabe even easy to fix) issues by the brute-force >> iterating approach. We also do not have a switch to run every pass twice >> in succession, just because that would be as stupid as this. >> >>> To be fair, when not using LTO, beyond 2-3 passes did not often >>> produce improvements unless individual compilation units were enormous. >>> >>> There was also the question of if some of the improvements seen with >>> multiple passes were indicative of deficiencies in early inlining, CFG, SRA, >>> etc. If the knob is available, I'm happy to continue testing on the same >>> projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl, >>> scummvm, binutils, etc) and write a report on what I observe as "suspicious" >>> improvements that perhaps should be caught/made in a single pass. >>> >>> It's worth noting again that while this is a useful feature in and of itself >>> (especially when combined with LTO), it's *extremely* useful when coupled >>> with the de-virtualization improvements submitted in other threads. The >>> examples submitted for inclusion in the test suite aren't academic -- they >>> are reductions of real-world performance issues from a mature (and shipping) >>> C++-based networking product. Any C++ codebase that employs physical >>> separation in their designs via Factory patterns, Interface Segregation, >>> and/or Dependency Inversion will likely see improvements. To me, these >>> enahncements combine to form one of the biggest leaps I've seen in C++ code >>> optimization -- code that can be clean, OO, *and* fast. >> >> But iterating the whole early optimization pipeline is not a sensible approach >> of attacking these. > > Richard, > > You are making valid points. Ideally, we would have a compiler with optimal pass pipeline that would not generate any better or worse code with second and subsequent iterations. I disagree, however, that the patch discussed in this thread is a hack. It is a feature. It allows developers and power-users to experiment with the pass pipeline and see what comes out of it. > > The benchmarks for the 2nd version of the patch have finally finished [*]. It shows that the additional iterations do make a difference of 2% for a number of benchmarks. There are also trends like "additional iterations improve -m64 and hurt -m32" -- 254.gap, 256.bzip2, 300.twolf, 171.swim, 173.applu -- and "additional iterations improve -O2 and hurt -O3" -- 179.art, 400.perlbench. There also are enough benchmarks that show straight improvement or straight regression. How are we going to investigate and fix these missed optimization opportunities and regressions without the feature that we are discussing here? > > We already provide certain features to enable users tinker with optimization pipeline: > 1. -O0/-O1/-O2/-O3 options for entry users, > 2. -f<pass>/-fno-<pass> for advanced users, > 3. --param <obscure>=MAGIC for developers and compiler-savvy users. > > Iterative optimizations should be no different. "Yes", without this feature the world would be a simpler place for us to live in, as we would not have to know and care about regressions and missed opportunities in GCC's optimizations. But, really, that would've been a boring world :-). > > I feel that dropping this feature will cause the problems to stay unnoticed and never get fixed. How? You are not even _analyzing_ why you are getting benefits, or filing bugs. Did you experiment with reducing the set of passes you iterate? Thus, which passes really do help when iterating? I've asked you that before - and showed proper analysis that could be done to detect the indirect inlining benefit that you might see (surely not on SPEC 2000 though) - and said that yes, it _would_ make sense to do sth about this specific case and yes, iterating for a selected set of functions is a valid solution for the indirect inlining case. It is never, IMHO, when no inlining happened between the iterations. You are basically saying, well - let's iterate optimizers some time and hope they will catch something. Sure they will. Show the important pieces and analyze why they need the iteration. You probably should realize that with each of this "power-user" features the GCC testing matrix becomes larger. If I'd sell commercial support for GCC I'd make all but -O* unsupported. Thanks, Richard.
diff --git a/gcc/cgraph.c b/gcc/cgraph.c index f056d3d..4738b28 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered) tree_lowering_passes (fndecl); bitmap_obstack_initialize (NULL); if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl))) - execute_pass_list (pass_early_local_passes.pass.sub); + execute_early_local_passes_for_current_function (); bitmap_obstack_release (NULL); pop_cfun (); current_function_decl = NULL; @@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered) gimple_register_cfg_hooks (); bitmap_obstack_initialize (NULL); if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl))) - execute_pass_list (pass_early_local_passes.pass.sub); + execute_early_local_passes_for_current_function (); I think these should only execute the lowering pieces of early local passes, let me see if that's properly split ...