Message ID | orlhjm2usl.fsf@livre.home |
---|---|
State | New |
Headers | show |
On Wed, Feb 25, 2015 at 10:40 AM, Alexandre Oliva <aoliva@redhat.com> wrote: > This patch fixes a problem that has been with us for several years. > Variable tracking has no idea about the end of the lifetime of inlined > variables, so it keeps on computing locations for them over and over, > even though the computed locations make no sense whatsoever because the > variable can't even be accessed any more. > > With this patch, we unbind all inlined variables at the point the > inlined function returns to, so that the locations for those variables > will not be touched any further. > > In theory, we could do something similar to non-inlined auto variables, > when they go out of scope, but their decls apply to the entire function > and I'm told gdb sort-of expects the variables to be accessible > throughout the function, so I'm not tackling that in this patch, for I'm > happy enough with what this patch gets us: > > - almost 99% reduction in the output asm for the PR testcase > > - more than 90% reduction in the peak memory use compiling that testcase > > - 63% reduction in the compile time for that testcase > > What's scary is that the testcase is not particularly pathological. Any > function that calls a longish sequence of inlined functions, that in > turn call other inline functions, and so on, something that's not > particularly unusual in C++, will likely observe significant > improvement, as we won't see growing sequences of var_location notes > after each call or so, as var-tracking computes a new in-stack location > for the implicit this argument of each previously-inlined function. > > Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? But code-motion could still move stmts from the inlined functions across these resets? That said - shouldn't this simply performed by proper var-tracking u-ops produced by a backward scan over the function for "live" scope-blocks? That is, when you see a scope block becoming live from exit then add u-ops resetting all vars from that scope block? Your patch as-is would add very many debug stmts to for example tramp3d. And as you say, the same reasoning applies to scopes in general, not just for inlines. Richard. > Reset inlined debug variables at the end of the inlined function > > From: Alexandre Oliva <aoliva@redhat.com> > > for gcc/ChangeLog > > PR debug/58315 > * tree-inline.c (reset_debug_binding): New. > (reset_debug_bindings): Likewise. > (expand_call_inline): Call it. > --- > gcc/tree-inline.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c > index d8abe03..5b58d8b 100644 > --- a/gcc/tree-inline.c > +++ b/gcc/tree-inline.c > @@ -4345,6 +4345,60 @@ add_local_variables (struct function *callee, struct function *caller, > } > } > > +/* Add to BINDINGS a debug stmt resetting SRCVAR if inlining might > + have brought in or introduced any debug stmts for SRCVAR. */ > + > +static inline void > +reset_debug_binding (copy_body_data *id, tree srcvar, gimple_seq *bindings) > +{ > + tree *remappedvarp = id->decl_map->get (srcvar); > + > + if (!remappedvarp) > + return; > + > + if (TREE_CODE (*remappedvarp) != VAR_DECL) > + return; > + > + if (*remappedvarp == id->retvar || *remappedvarp == id->retbnd) > + return; > + > + tree tvar = target_for_debug_bind (*remappedvarp); > + if (!tvar) > + return; > + > + gdebug *stmt = gimple_build_debug_bind (tvar, NULL_TREE, > + id->call_stmt); > + gimple_seq_add_stmt (bindings, stmt); > +} > + > +/* For each inlined variable for which we may have debug bind stmts, > + add before GSI a final debug stmt resetting it, marking the end of > + its life, so that var-tracking knows it doesn't have to compute > + further locations for it. */ > + > +static inline void > +reset_debug_bindings (copy_body_data *id, gimple_stmt_iterator gsi) > +{ > + tree var; > + unsigned ix; > + gimple_seq bindings = NULL; > + > + if (!gimple_in_ssa_p (id->src_cfun)) > + return; > + > + if (!opt_for_fn (id->dst_fn, flag_var_tracking_assignments)) > + return; > + > + for (var = DECL_ARGUMENTS (id->src_fn); > + var; var = DECL_CHAIN (var)) > + reset_debug_binding (id, var, &bindings); > + > + FOR_EACH_LOCAL_DECL (id->src_cfun, ix, var) > + reset_debug_binding (id, var, &bindings); > + > + gsi_insert_seq_before_without_update (&gsi, bindings, GSI_SAME_STMT); > +} > + > /* If STMT is a GIMPLE_CALL, replace it with its inline expansion. */ > > static bool > @@ -4650,6 +4704,8 @@ expand_call_inline (basic_block bb, gimple stmt, copy_body_data *id) > GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), > bb, return_block, NULL); > > + reset_debug_bindings (id, stmt_gsi); > + > /* Reset the escaped solution. */ > if (cfun->gimple_df) > pt_solution_reset (&cfun->gimple_df->escaped); > > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
On Wed, Feb 25, 2015 at 11:54:16AM +0100, Richard Biener wrote: > > Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? > > But code-motion could still move stmts from the inlined functions > across these resets? That said - shouldn't this simply performed > by proper var-tracking u-ops produced by a backward scan over the > function for "live" scope-blocks? That is, when you see a scope > block becoming live from exit then add u-ops resetting all > vars from that scope block? Yeah, wanted to say the same, I'm afraid such a change will very much affect debugging experience close to the end of inlined functions, as sinking, scheduling and various other passes may move statements from the inline functions across those resets. And, various tools and users really want to be able to inspect variables and parameters on the return statement. So, IMHO to do something like this, we'd need to mark those debug stmts some way to say that they aren't normal debug binds, but debug binds at the end of scopes (whether inlined functions or just lexical blocks), and optimization passes that perform code motion should try to detect the case when they are moving downward some statements across such debug stmts and move those debug stmts along with those if possible. And another thing is the amount of the added debug stmts, right now we don't add debug stmts all the time for everything, just when something is needed, while your patch adds it unconditionally, even when debug stmts for those won't be really emitted. As they are just resets, that hopefully will not drastically affect var-tracking time, but might affect other optimization passes, which would need to deal with much more statements than before. Jakub
On Feb 25, 2015, Richard Biener <richard.guenther@gmail.com> wrote: > But code-motion could still move stmts from the inlined functions > across these resets? Sure, just like it could still move stmts across any other debug stmts. Once you return from a function, it's as if all of its variables ceased to exist, so what is the problem with this? The real error, IMHO, is to assume the moved instruction is still inside the inline function. It isn't. If you wanted to inspect the variable before it went out of scope, the debugger should have helped you stop there, not stop you at an instruction that is outside the expected flow. > That said - shouldn't this simply performed by proper var-tracking > u-ops produced by a backward scan over the function for "live" > scope-blocks? Please elaborate (ideally with a patch); I have no idea of how you plan to map scope blocks to their original (and thus correct) position (i.e., before any code motion). > That is, when you see a scope block becoming live from exit then add > u-ops resetting all vars from that scope block? Oh, you want code motion artifacts to leak into the language VM execution modeled in debug info? That would be fundamentally against the model implemented with debug stmts and insns. /me mumbles something about mixing metaphors and leaky screwdrivers ;-D IOW, I don't think that would be appropriate at all. Remember the VTA presentation at the GCC Summit years ago, when I showed you just can't hope to recover debug info from code already mangled by the compiler, because optimizations are lossy? You get to a point in which you don't know what you don't know, so you're bound to make wrong decisions. So, take note of what you're going to need when you know it's still accurate. > Your patch as-is would add very many debug stmts to for example > tramp3d. Do you have any evidence that this would have a negative impact on compile time, memory use or any other relevant metric? Odds are that, if it makes any difference whatsoever, it will be a very positive one. The absolute worst case of this patch is doubling the debug stmt count (proof: it will add at most one debug stmt per inlined variable that had at least one debug stmt). Now, if you're concerned about debug stmt count, we could introduce another debug stmt/insn form that can reset multiple variables in a single swoop. It does not seem to me like this would be worth the effort. > And as you say, the same reasoning applies to scopes in general, not > just for inlines. I actually said the opposite. We turn block-local variables into function-wide declarations very early, so apparently we *can* reference the variables even when they're out of scope. But we *cannot* do that for inlined variables. That's why I drew the line where I did. (Plus, introducing the debug temps at the point I did was easy to try, and it had a huge positive impact :-) Sure we *could* introduce debug unbind stmts at the end of scopes. If you have evidence or even a hunch that this will have a positive effect on relevant metrics, go for it!
On Feb 25, 2015, Jakub Jelinek <jakub@redhat.com> wrote: > various tools and users really want to > be able to inspect variables and parameters on the return statement. This patch won't affect the return statement. The resets are at the return-to statement; if you stop at the return statement (assuming you have code generated for it in place), you should still get the same bindings. Now, if *all* the code for the return statement was optimized away, and you stop at the subsequent instruction, that happens to be past the return, then, yeah, you're out of luck, but then you were already out of luck before. Now, there is of course the case in which there is some code left in place for the return stmt, but it is no longer in its natural position in the code flow. You stop there, you're technically out of the inline scope, but now you're also past the "clobbering" of the inlined variables. The real solution for this is to implement the stmt frontiers I presented at some GCC Summit, so that, when you stop at a statement, you get the state you expect regardless of code motion, because you get the state at the natural flow of control, even if node actual code remained at that point. > And another thing is the amount of the added debug stmts, right now we don't > add debug stmts all the time for everything, just when something is needed, We add debug stmts whenever a trackable (auto "gimple register") variable is modified. They are "clobbered" at the end of the inline function they expanded out of, so this just corrects an long-standing and quite expensive oversight. You won't get debug stmts for unused inlined variables, for example: you should only get them for variables that were remapped while inlining the code to begin with. If you got code for them and they are trackable, you certainly got debug stmts for them as well. > while your patch adds it unconditionally, even when debug stmts for those > won't be really emitted. It shouldn't. Please show me? > As they are just resets, that hopefully will not > drastically affect var-tracking time They will. But in a positive way :-) > but might affect other optimization passes, which would need to deal > with much more statements than before. It shouldn't be hard to test this hypothesis one way or another. Tweak the code that introduces the new debug temps so that they all refer to the same fake variable, as opposed to resetting the intended variables, and then you'll have an exact measure of the compile-time impact *without* the savings afforded by the resets. Then we can compare whether it's an overall win or loss. My measurements, for a not particularly unusual testcase, showed an overall reduction of 63% in compile time, as indicated yesterday. Now, who should bear the burden of collecting evidence to back up the claims against the change? Are those concerns enough to hold it up?
On Wed, Feb 25, 2015 at 06:17:33PM -0300, Alexandre Oliva wrote: > My measurements, for a not particularly unusual testcase, showed an > overall reduction of 63% in compile time, as indicated yesterday. Now, > who should bear the burden of collecting evidence to back up the claims > against the change? Are those concerns enough to hold it up? Can you e.g. run dwlocstat on some larger C++ binaries built without and with your patch? I believe dwlocstat is supposed to count only the instructions where the variables or parameters are in scope, so should be exactly what we care about here. E.g. cc1plus and libstdc++.so.6 might be good candidates from gcc itself, perhaps firefox or similar as something even larger. Jakub
On Feb 25, 2015, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Feb 25, 2015 at 06:17:33PM -0300, Alexandre Oliva wrote: >> My measurements, for a not particularly unusual testcase, showed an >> overall reduction of 63% in compile time, as indicated yesterday. Now, >> who should bear the burden of collecting evidence to back up the claims >> against the change? Are those concerns enough to hold it up? > Can you e.g. run dwlocstat on some larger C++ binaries built without and > with your patch? I believe dwlocstat is supposed to count only the > instructions where the variables or parameters are in scope, so should be > exactly what we care about here. Erhm... I don't think that would cover the case you were worried about, namely inspecting variables of an inlined function while at a statement moved out of the function ranges. Anyway, I've run dwlocstat and inspected the results. There is indeed a significant reduction in the coverage, so I looked into that. What I found out is that the reduction is an improvement on yet another long-standing var-tracking issue: if a function is called within a loop and we inline it, bindings from the call in one iteration of the loop will carry over onto the subsequent iteration until a new binding occurs. This accounts for all of the coverage reductions I've investigated. This, in turn, suggests that introducing reset stmts when variables go out of scope even in local blocks will further reduce debug info, although in some cases it might have the opposite effect. E.g., if the actual live range of a variable is scattered across multiple non-contiguous blocks, stopping the binding from being used in intervening blocks where the variable is not live will cause additional entries in the location list. I've observed this effect with the proposed patch, too, but it removes a lot more nonsensical entries than it adds entries to account for not covering intervening ranges by accident.
On Wed, Feb 25, 2015 at 09:01:09PM -0300, Alexandre Oliva wrote: > > On Wed, Feb 25, 2015 at 06:17:33PM -0300, Alexandre Oliva wrote: > >> My measurements, for a not particularly unusual testcase, showed an > >> overall reduction of 63% in compile time, as indicated yesterday. Now, > >> who should bear the burden of collecting evidence to back up the claims > >> against the change? Are those concerns enough to hold it up? > > > Can you e.g. run dwlocstat on some larger C++ binaries built without and > > with your patch? I believe dwlocstat is supposed to count only the > > instructions where the variables or parameters are in scope, so should be > > exactly what we care about here. > > Erhm... I don't think that would cover the case you were worried about, > namely inspecting variables of an inlined function while at a statement > moved out of the function ranges. > > Anyway, I've run dwlocstat and inspected the results. There is indeed a > significant reduction in the coverage, so I looked into that. Significant reduction in the coverage should be a red flag. I admit I haven't read dwlocstat sources recently, but: dwlocstat ========= This is a tool for examining Dwarf location info coverage. It goes through DIEs of given binary's debug info that represent variables and function parameters. For each such DIE, it computes coverage of that DIE's range (list of addresses of DIE's scope) by location expressions (a description of where given variable is located at given location: e.g. a variable can be in a register). matches what I wanted the tool to do, namely, if you have say some DW_TAG_variable that is in scope of say an inline function and that DW_TAG_inlined_subroutine has DW_AT_ranges L1-L2, L3-L4, L5-L6, it counts on what percentage of bytes in those ranges (or instructions?) the variable has defined location. The fear I have is that due to scheduling or any other code motion toward the end of function, you simply have instructions like L5-L6 above that are considered part of the function, but that the newly added resets are say on L4 or so and thus whenever you stop on the L5-L6 instructions, the debugger will show you you are in the inline function, but you won't be able to inspect most if not any variables. > What I found out is that the reduction is an improvement on yet another > long-standing var-tracking issue: if a function is called within a loop > and we inline it, bindings from the call in one iteration of the loop > will carry over onto the subsequent iteration until a new binding > occurs. This accounts for all of the coverage reductions I've > investigated. It really depends. Say if you have some inline void foo (void) { int somevar; LARGE CODE NOT INITIALIZING somevar; somevar = VALUE; USE somevar; } and for (...) { foo (); } and the loop is unrolled, then indeed, lost of coverage between the start of the second foo and somevar = VALUE; is acceptable, if it previously just contained the value from previous iteration. But as I said above, if it is about the last few stmts in the inline scope, it is undesirable, and if e.g. the instructions from different unrolled iterations are intermixed, then it is undesirable too. Which is why I was suggesting special debug binds that the optimizers would try to move downward (towards the exit block) on (as much as possible) any code motions across them, unless the debug binds would be moved across a debug bind for the same decl - in that case the end of scope debug bind should be removed rather than moved over. In most cases, there won't be much of code motion across the whole function, but only limited scope, so in the common case you'll still get the effect you are looking for or close to that, but it won't actually penalize the debug info quality. Jakub
On Thu, Feb 26, 2015 at 1:01 AM, Alexandre Oliva <aoliva@redhat.com> wrote: > On Feb 25, 2015, Jakub Jelinek <jakub@redhat.com> wrote: > >> On Wed, Feb 25, 2015 at 06:17:33PM -0300, Alexandre Oliva wrote: >>> My measurements, for a not particularly unusual testcase, showed an >>> overall reduction of 63% in compile time, as indicated yesterday. Now, >>> who should bear the burden of collecting evidence to back up the claims >>> against the change? Are those concerns enough to hold it up? > >> Can you e.g. run dwlocstat on some larger C++ binaries built without and >> with your patch? I believe dwlocstat is supposed to count only the >> instructions where the variables or parameters are in scope, so should be >> exactly what we care about here. > > Erhm... I don't think that would cover the case you were worried about, > namely inspecting variables of an inlined function while at a statement > moved out of the function ranges. > > Anyway, I've run dwlocstat and inspected the results. There is indeed a > significant reduction in the coverage, so I looked into that. > > What I found out is that the reduction is an improvement on yet another > long-standing var-tracking issue: if a function is called within a loop > and we inline it, bindings from the call in one iteration of the loop > will carry over onto the subsequent iteration until a new binding > occurs. This accounts for all of the coverage reductions I've > investigated. > > This, in turn, suggests that introducing reset stmts when variables go > out of scope even in local blocks will further reduce debug info, > although in some cases it might have the opposite effect. E.g., if the > actual live range of a variable is scattered across multiple > non-contiguous blocks, stopping the binding from being used in > intervening blocks where the variable is not live will cause additional > entries in the location list. I've observed this effect with the > proposed patch, too, but it removes a lot more nonsensical entries than > it adds entries to account for not covering intervening ranges by > accident. Answering your questions on my mail here (because it fits I think), and more concentrating on the effect of your patch as opposed to debug stmt philosophy. Your patch ends up pruning the var-tracking sets at the additional reset-points you introduce. You introduce them at inline boundaries (which looks reasonable minus code-motion issues). I claim you can achieve the same result by effectively inserting those reset debug insns right before var-tracking itself and by re-computing BLOCK "liveness". That will automatically deal with code motion that extends the life-range of an inlined BLOCK by moving stmts (still associated with that BLOCK) across the return point of the inlined call (and thus across your debug resets). And it will allow var-tracking to eventually compute locations for vars at the "entry" of that BLOCK piece. After all you say correctly that what matters is location info for X in places where a debug consumer can successfully lookup X (that is, in PC ranges where the scope X was declared in is active). In other places there is no reason to emit location info for X (but we might still want to compute it during var-tracking if at a later PC range the scope will be active again). Now I said you could do var-tracking uops for those resets somehow, but I now realize that I have not much internal knowledge of var-tracking. Thus consider the suggestion to insert reset debug insns at the beginning of var-tracking and at points where a scope becomes dead (thus after points where in a backward CFG walk a scope becomes live). Richard. > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
On Thu, Feb 26, 2015 at 11:29:35AM +0100, Richard Biener wrote: > I claim you can achieve the same result by effectively inserting > those reset debug insns right before var-tracking itself and by > re-computing BLOCK "liveness". That will automatically deal > with code motion that extends the life-range of an inlined BLOCK > by moving stmts (still associated with that BLOCK) across the > return point of the inlined call (and thus across your debug resets). > And it will allow var-tracking to eventually compute locations for > vars at the "entry" of that BLOCK piece. If that would work, it would be nice, but I'm not sure how you can do that. Using something like final.c (reemit_insn_block_notes) you can discover the ranges of scopes (inline functions and lexical blocks). If some scope has a single range, it is the easy case, you know where it starts and where it ends. For scopes with multiple ranges, how can you find out what case it is though? I mean, either it can be just the case that some instruction(s) with different scope got scheduled (or sunk / whatever) in between the instructions of the scope, in that case resetting the vars there is highly undesirable (would majorly grow the .debug_loc lists and break debuggers, e.g. when you try to watch some variable through the live of some inline, if you reset it any time a single unrelated insn is encountered, the debugger would need to tell you the var got lost/changed). Or it can be the case of unrolled loop which has lexical scopes or inlines in it, in that case you will have multiple "same" scopes, but in that case it is unnecessary to preserve variables in between those, it is really different inlines. Jakub
On Thu, Feb 26, 2015 at 11:42 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Feb 26, 2015 at 11:29:35AM +0100, Richard Biener wrote: >> I claim you can achieve the same result by effectively inserting >> those reset debug insns right before var-tracking itself and by >> re-computing BLOCK "liveness". That will automatically deal >> with code motion that extends the life-range of an inlined BLOCK >> by moving stmts (still associated with that BLOCK) across the >> return point of the inlined call (and thus across your debug resets). >> And it will allow var-tracking to eventually compute locations for >> vars at the "entry" of that BLOCK piece. > > If that would work, it would be nice, but I'm not sure how you can do that. > Using something like final.c (reemit_insn_block_notes) you can discover > the ranges of scopes (inline functions and lexical blocks). > If some scope has a single range, it is the easy case, you know where it > starts and where it ends. For scopes with multiple ranges, how can you find > out what case it is though? I mean, either it can be just the case that > some instruction(s) with different scope got scheduled (or sunk / whatever) > in between the instructions of the scope, in that case resetting the vars > there is highly undesirable (would majorly grow the .debug_loc lists and > break debuggers, e.g. when you try to watch some variable through the live > of some inline, if you reset it any time a single unrelated insn is > encountered, the debugger would need to tell you the var got lost/changed). > Or it can be the case of unrolled loop which has lexical scopes or inlines > in it, in that case you will have multiple "same" scopes, but in that case > it is unnecessary to preserve variables in between those, it is really > different inlines. I don't claim you can get the "maximal" pruning of the var-tracking solutions. I just claim that you can do something sensible by computing where BLOCKs go dead. And that this would be better than simply ignoring the code motion issue alltogether. If Alexs solution would be acceptable in terms of that issue then we can as well just insert debug resets for each variable at initial BLOCK boundaries. After all if the inliner inserts resets just for vars that already have debug stmts then I cook up a testcase where those debug stmts only appear after inlining. Indeed if we want to be as close to the source as possible we should insert debug stmts from the start (where the values are still computed) so that code-motion will simply make it unavailable (and also reset locations so you don't get gdb jumping around). Richard. > > Jakub
Jakub Jelinek <jakub@redhat.com> writes: > it counts on what percentage of bytes in those ranges (or instructions?) > the variable has defined location. Yes, it counts bytes. It doesn't know anything about instruction lengths etc. Thanks, Petr
On Feb 26, 2015, Richard Biener <richard.guenther@gmail.com> wrote: > and more concentrating on the effect of your patch as opposed to > debug stmt philosophy. > (which looks reasonable minus code-motion issues). > (but we might still want to compute it during var-tracking > if at a later PC range the scope will be active again). By taking debug stmt philosophy out of your reasoning, you come to a conclusion that directly contradicts the very philosophy. I.e., you're not setting the philosophy aside, you're setting yourself up for failure within that philosophy. Your note on code-motion issues shows another weakness, not in the philosophy I endorse, but on rather on the one you do. If annotations don't move, and they carry all the information debuggers need to care about, there's no code-motion issue whatsoever, and moving executable instructions would not have any effects whatsoever. Sure, we're not there yet, but if we want to get there, taking steps in the opposite direction won't take us there any time soon. > Thus consider the suggestion to insert reset debug insns at the beginning > of var-tracking and at points where a scope becomes dead (thus after > points where in a backward CFG walk a scope becomes live). The point where the scope becomes dead under what philosophical line? I believe we've already established that trying to recover that information after throwing it away is cheaper but error prone. I believe we've already agreed that we don't want debug information to be incorrect.
On Feb 26, 2015, Richard Biener <richard.guenther@gmail.com> wrote: > After all if the inliner inserts resets just for vars that already > have debug stmts then I cook up a testcase where those debug stmts > only appear after inlining. Please try that. Hint: the actual requirement is that the VTA-trackable var has been remapped during inlining of executable code. Its having debug stmts is a consequence of its being there. I wonder, thus, how the compiler would bring a var that hadn't even been remapped back from the dead, so as to introduce a debug stmt for it. Tough! :-) > Indeed if we want to be as close to the source as possible we should > insert debug stmts from the start (where the values are still computed) > so that code-motion will simply make it unavailable (and also reset > locations so you don't get gdb jumping around). *nod* Statement frontiers notes are a proposed solution for the jumping around and eliminating code motion issues from debug info.
On Feb 25, 2015, Alexandre Oliva <aoliva@redhat.com> wrote: > if a function is called within a loop and we inline it, bindings from > the call in one iteration of the loop will carry over onto the > subsequent iteration until a new binding occurs. Wait, I have to take that back and revisit the code I looked at. var-tracking handles confluences differently between VTA-tracked and old-style var-tracking vars, and I was probably confused by that: it occurred to me this morning that confluence points of VTA-tracked variables perform a set intersection, whereas old-style VTA performs set union across locations at all incoming edges. So, what I have concluded should only apply to old-style var-tracking vars, or to VTA-tracked vars in unrolled loops whose intermediate conditions were optimized out (so that there isn't any confluence with incoming edges without bindings for the var). I'm afraid I'll have to look again and figure out what I got wrong. Apologies for the noise.
On Feb 26, 2015, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Feb 25, 2015 at 09:01:09PM -0300, Alexandre Oliva wrote: >> > On Wed, Feb 25, 2015 at 06:17:33PM -0300, Alexandre Oliva wrote: >> >> My measurements, for a not particularly unusual testcase, showed an >> >> overall reduction of 63% in compile time, as indicated yesterday. Now, >> >> who should bear the burden of collecting evidence to back up the claims >> >> against the change? Are those concerns enough to hold it up? >> >> > Can you e.g. run dwlocstat on some larger C++ binaries built without and >> > with your patch? I believe dwlocstat is supposed to count only the >> > instructions where the variables or parameters are in scope, so should be >> > exactly what we care about here. >> >> Erhm... I don't think that would cover the case you were worried about, >> namely inspecting variables of an inlined function while at a statement >> moved out of the function ranges. >> >> Anyway, I've run dwlocstat and inspected the results. There is indeed a >> significant reduction in the coverage, so I looked into that. > Significant reduction in the coverage should be a red flag. Ok, I looked into it further, after patching dwlocstat to dump per-variable per-range coverage/length info, so as to be able to compare object files more easily. So far, all the differences I looked at were caused by padding at the end of BBs, and by jump stmts without line numbers at the end of BBs, both right after the debug reset stmts the proposed patch introduces. I'll dig further, because so far I've only looked at a few cases. I have to figure out some way to automate the investigation of these differences, because it has been too time-intensive, and not really fruitful in terms of finding the scenarios you're concerned with. The good news is that, looking deeper into cases that appeared to leak info across loop iterations, I confirmed I was mistaken in yesterday's analysis. Phew! :-)
Alexandre Oliva <aoliva@redhat.com> writes: > Ok, I looked into it further, after patching dwlocstat to dump > per-variable per-range coverage/length info, so as to be able to compare > object files more easily. If you send me those patches, I can finish them, bind the functionality to a command line option, and merge upstream. Thanks, Petr
On Wed, Feb 25, 2015 at 10:40 AM, Alexandre Oliva <aoliva@redhat.com> wrote: > This patch fixes a problem that has been with us for several years. > Variable tracking has no idea about the end of the lifetime of inlined > variables, so it keeps on computing locations for them over and over, > even though the computed locations make no sense whatsoever because the > variable can't even be accessed any more. > > With this patch, we unbind all inlined variables at the point the > inlined function returns to, so that the locations for those variables > will not be touched any further. > > In theory, we could do something similar to non-inlined auto variables, > when they go out of scope, but their decls apply to the entire function > and I'm told gdb sort-of expects the variables to be accessible > throughout the function, so I'm not tackling that in this patch, for I'm > happy enough with what this patch gets us: > > - almost 99% reduction in the output asm for the PR testcase > > - more than 90% reduction in the peak memory use compiling that testcase > > - 63% reduction in the compile time for that testcase > > What's scary is that the testcase is not particularly pathological. Any > function that calls a longish sequence of inlined functions, that in > turn call other inline functions, and so on, something that's not > particularly unusual in C++, will likely observe significant > improvement, as we won't see growing sequences of var_location notes > after each call or so, as var-tracking computes a new in-stack location > for the implicit this argument of each previously-inlined function. > > Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? Some numbers for tramp3d at -Ofast -g with an optimized, release checking (but not bootstrapped) trunk. Compile-time memory usage seems unchanged (~980MB, to show differences we'd probably need to ggc-collect more often). Compile-time was slightly faster with the patch, 45s vs. 47s, but the machine wasn't completely un-loaded. var-tracking parts: unpatched: variable tracking : 0.63 ( 1%) usr 0.03 ( 1%) sys 0.82 ( 2%) wall 28641 kB ( 2%) ggc var-tracking dataflow : 3.72 ( 8%) usr 0.04 ( 1%) sys 3.65 ( 7%) wall 1337 kB ( 0%) ggc var-tracking emit : 2.63 ( 6%) usr 0.02 ( 1%) sys 2.55 ( 5%) wall 148835 kB ( 8%) ggc patched: variable tracking : 0.64 ( 1%) usr 0.01 ( 0%) sys 0.72 ( 1%) wall 24202 kB ( 1%) ggc var-tracking dataflow : 1.96 ( 4%) usr 0.01 ( 0%) sys 1.94 ( 4%) wall 1326 kB ( 0%) ggc var-tracking emit : 1.46 ( 3%) usr 0.02 ( 0%) sys 1.49 ( 3%) wall 46980 kB ( 3%) ggc we have at the point of RTL expansion 56% more debug statements (988231 lines with # DEBUG in the .optimized dump out of 1212518 lines in total vs. 630666 out of 854953). So we go from roughly 1 debug stmt per real stmt to 1.5 debug stmts per real stmt. I didn't try to inspect generated debug information or asses its quality. But I wonder for example what we do for _25 = MEM[(const struct DomainD.47403 *)this_11(D) + 8B].D.47690.domain_mD.47464; # DEBUG dD.570173 => _25 # DEBUG dD.570173 => NULL # DEBUG thisD.572552 => NULL npc_14 = _25 / _27; # DEBUG npcD.171771 => npc_14 # DEBUG D#447 => &this_11(D)->blocks_mD.64412.D.47809 # DEBUG thisD.572554 => D#447 # DEBUG dD.570173 => _25 # DEBUG dD.570173 => NULL # DEBUG thisD.572554 => NULL remainder_16 = _25 % _27; we have two pairs of DEBUG stmts for dD.570173 here, binding to _25 and then immediately resetting. Apart from that we might possibly DCE those I wonder if this isn't breaking debug experience vs. what we had previously: _25 = MEM[(const struct DomainD.47403 *)this_11(D) + 8B].D.47690.domain_mD.47464; # DEBUG dD.570173 => _25 npc_14 = _25 / _27; # DEBUG npcD.171771 => npc_14 # DEBUG D#447 => &this_11(D)->blocks_mD.64412.D.47809 # DEBUG thisD.572554 => D#447 # DEBUG dD.570173 => _25 remainder_16 = _25 % _27; dD.570173 is never reset anywhere in the function without the patch. Just for curiosity here is the last dump piece again, this time with location information (we should dump the associated BLOCK someow) [tramp3d-v4.cpp:3457:45] _25 = [tramp3d-v4.cpp:3457:45] MEM[(const struct DomainD.47403 *)this_11(D) + 8B].D.47690.domain_mD.47464; [tramp3d-v4.cpp:3457:45] # DEBUG dD.570173 => _25 [tramp3d-v4.cpp:53176:32] npc_14 = _25 / _27; [tramp3d-v4.cpp:53176:32] # DEBUG npcD.171771 => npc_14 [tramp3d-v4.cpp:53177:33] # DEBUG D#447 => [tramp3d-v4.cpp:53177:33] &[tramp3d-v4.cpp:53177:19] this_11(D)->blocks_mD.64412.D.47809 [tramp3d-v4.cpp:53177:33] # DEBUG thisD.572554 => D#447 [tramp3d-v4.cpp:3457:45] # DEBUG dD.570173 => _25 [tramp3d-v4.cpp:53177:38] remainder_16 = _25 % _27; like 3457 is inline Element_t first() const { return DT::first(this->domain_m); } and 5317[67] are int npc = blocks_m.first() / contexts; int remainder = blocks_m.first() % contexts; 'd' comes from DT::first(this->domain_m); which is on line 4604: inline static Element_t first(Storage_t d) { return d; } so it assigns a name to this->domain_m but we don't have any assembler instruction left for that BLOCK. Which means my pragmatic approach would have re-set the variable after the second bind only. We likely still keep the inline BLOCK (just wonder what PC range we assign to it and if I could break on it in gdb...). Richard. > Reset inlined debug variables at the end of the inlined function > > From: Alexandre Oliva <aoliva@redhat.com> > > for gcc/ChangeLog > > PR debug/58315 > * tree-inline.c (reset_debug_binding): New. > (reset_debug_bindings): Likewise. > (expand_call_inline): Call it. > --- > gcc/tree-inline.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c > index d8abe03..5b58d8b 100644 > --- a/gcc/tree-inline.c > +++ b/gcc/tree-inline.c > @@ -4345,6 +4345,60 @@ add_local_variables (struct function *callee, struct function *caller, > } > } > > +/* Add to BINDINGS a debug stmt resetting SRCVAR if inlining might > + have brought in or introduced any debug stmts for SRCVAR. */ > + > +static inline void > +reset_debug_binding (copy_body_data *id, tree srcvar, gimple_seq *bindings) > +{ > + tree *remappedvarp = id->decl_map->get (srcvar); > + > + if (!remappedvarp) > + return; > + > + if (TREE_CODE (*remappedvarp) != VAR_DECL) > + return; > + > + if (*remappedvarp == id->retvar || *remappedvarp == id->retbnd) > + return; > + > + tree tvar = target_for_debug_bind (*remappedvarp); > + if (!tvar) > + return; > + > + gdebug *stmt = gimple_build_debug_bind (tvar, NULL_TREE, > + id->call_stmt); > + gimple_seq_add_stmt (bindings, stmt); > +} > + > +/* For each inlined variable for which we may have debug bind stmts, > + add before GSI a final debug stmt resetting it, marking the end of > + its life, so that var-tracking knows it doesn't have to compute > + further locations for it. */ > + > +static inline void > +reset_debug_bindings (copy_body_data *id, gimple_stmt_iterator gsi) > +{ > + tree var; > + unsigned ix; > + gimple_seq bindings = NULL; > + > + if (!gimple_in_ssa_p (id->src_cfun)) > + return; > + > + if (!opt_for_fn (id->dst_fn, flag_var_tracking_assignments)) > + return; > + > + for (var = DECL_ARGUMENTS (id->src_fn); > + var; var = DECL_CHAIN (var)) > + reset_debug_binding (id, var, &bindings); > + > + FOR_EACH_LOCAL_DECL (id->src_cfun, ix, var) > + reset_debug_binding (id, var, &bindings); > + > + gsi_insert_seq_before_without_update (&gsi, bindings, GSI_SAME_STMT); > +} > + > /* If STMT is a GIMPLE_CALL, replace it with its inline expansion. */ > > static bool > @@ -4650,6 +4704,8 @@ expand_call_inline (basic_block bb, gimple stmt, copy_body_data *id) > GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), > bb, return_block, NULL); > > + reset_debug_bindings (id, stmt_gsi); > + > /* Reset the escaped solution. */ > if (cfun->gimple_df) > pt_solution_reset (&cfun->gimple_df->escaped); > > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
On Mar 4, 2015, Richard Biener <richard.guenther@gmail.com> wrote: > Compile-time was slightly faster with the patch, 45s vs. 47s, > but the machine wasn't completely un-loaded. var-tracking parts: > unpatched: > variable tracking : 0.63 ( 1%) usr 0.03 ( 1%) sys 0.82 ( > 2%) wall 28641 kB ( 2%) ggc > var-tracking dataflow : 3.72 ( 8%) usr 0.04 ( 1%) sys 3.65 ( > 7%) wall 1337 kB ( 0%) ggc > var-tracking emit : 2.63 ( 6%) usr 0.02 ( 1%) sys 2.55 ( > 5%) wall 148835 kB ( 8%) ggc > patched: > variable tracking : 0.64 ( 1%) usr 0.01 ( 0%) sys 0.72 ( > 1%) wall 24202 kB ( 1%) ggc > var-tracking dataflow : 1.96 ( 4%) usr 0.01 ( 0%) sys 1.94 ( > 4%) wall 1326 kB ( 0%) ggc > var-tracking emit : 1.46 ( 3%) usr 0.02 ( 0%) sys 1.49 ( > 3%) wall 46980 kB ( 3%) ggc > we have at the point of RTL expansion 56% more debug statements > (988231 lines with # DEBUG in the .optimized dump out of > 1212518 lines in total vs. 630666 out of 854953). So we go from > roughly 1 debug stmt per real stmt to 1.5 debug stmts per real stmt. So, if I got this right, all these extra debug stmts and insns had no effect whatsoever on compile time proper. The reduction in compile time can be entirely accounted for by the time they save in the var-tracking parts, and any compile time increase they bring about in other passes is negligible. Does this match your assessment of the impact of the patch? > we have two pairs of DEBUG stmts for dD.570173 here, binding > to _25 and then immediately resetting. They're at different lines, and associated with different statements, so once we have stmt frontiers support in GCC and GDB, you will be able to stop between them an inspect the state, regardless of the absence of executable code between them.
On March 5, 2015 8:26:42 PM CET, Alexandre Oliva <aoliva@redhat.com> wrote: >On Mar 4, 2015, Richard Biener <richard.guenther@gmail.com> wrote: > >> Compile-time was slightly faster with the patch, 45s vs. 47s, >> but the machine wasn't completely un-loaded. var-tracking parts: > >> unpatched: > >> variable tracking : 0.63 ( 1%) usr 0.03 ( 1%) sys 0.82 ( >> 2%) wall 28641 kB ( 2%) ggc >> var-tracking dataflow : 3.72 ( 8%) usr 0.04 ( 1%) sys 3.65 ( >> 7%) wall 1337 kB ( 0%) ggc >> var-tracking emit : 2.63 ( 6%) usr 0.02 ( 1%) sys 2.55 ( >> 5%) wall 148835 kB ( 8%) ggc > >> patched: > >> variable tracking : 0.64 ( 1%) usr 0.01 ( 0%) sys 0.72 ( >> 1%) wall 24202 kB ( 1%) ggc >> var-tracking dataflow : 1.96 ( 4%) usr 0.01 ( 0%) sys 1.94 ( >> 4%) wall 1326 kB ( 0%) ggc >> var-tracking emit : 1.46 ( 3%) usr 0.02 ( 0%) sys 1.49 ( >> 3%) wall 46980 kB ( 3%) ggc > >> we have at the point of RTL expansion 56% more debug statements >> (988231 lines with # DEBUG in the .optimized dump out of >> 1212518 lines in total vs. 630666 out of 854953). So we go from >> roughly 1 debug stmt per real stmt to 1.5 debug stmts per real stmt. > >So, if I got this right, all these extra debug stmts and insns had no >effect whatsoever on compile time proper. The reduction in compile >time >can be entirely accounted for by the time they save in the var-tracking >parts, and any compile time increase they bring about in other passes >is >negligible. > >Does this match your assessment of the impact of the patch? For the effect on tramp3d, yes. The positive effect on var-tracking compile time really looks good. So I'm tempted to approve the patch for 5.0. >> we have two pairs of DEBUG stmts for dD.570173 here, binding >> to _25 and then immediately resetting. > >They're at different lines, and associated with different statements, >so >once we have stmt frontiers support in GCC and GDB, you will be able to >stop between them an inspect the state, regardless of the absence of >executable code between them. I wonder why we use the same decl uid for two different inline instances though. Do we not remap them during inlining? Richard.
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index d8abe03..5b58d8b 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -4345,6 +4345,60 @@ add_local_variables (struct function *callee, struct function *caller, } } +/* Add to BINDINGS a debug stmt resetting SRCVAR if inlining might + have brought in or introduced any debug stmts for SRCVAR. */ + +static inline void +reset_debug_binding (copy_body_data *id, tree srcvar, gimple_seq *bindings) +{ + tree *remappedvarp = id->decl_map->get (srcvar); + + if (!remappedvarp) + return; + + if (TREE_CODE (*remappedvarp) != VAR_DECL) + return; + + if (*remappedvarp == id->retvar || *remappedvarp == id->retbnd) + return; + + tree tvar = target_for_debug_bind (*remappedvarp); + if (!tvar) + return; + + gdebug *stmt = gimple_build_debug_bind (tvar, NULL_TREE, + id->call_stmt); + gimple_seq_add_stmt (bindings, stmt); +} + +/* For each inlined variable for which we may have debug bind stmts, + add before GSI a final debug stmt resetting it, marking the end of + its life, so that var-tracking knows it doesn't have to compute + further locations for it. */ + +static inline void +reset_debug_bindings (copy_body_data *id, gimple_stmt_iterator gsi) +{ + tree var; + unsigned ix; + gimple_seq bindings = NULL; + + if (!gimple_in_ssa_p (id->src_cfun)) + return; + + if (!opt_for_fn (id->dst_fn, flag_var_tracking_assignments)) + return; + + for (var = DECL_ARGUMENTS (id->src_fn); + var; var = DECL_CHAIN (var)) + reset_debug_binding (id, var, &bindings); + + FOR_EACH_LOCAL_DECL (id->src_cfun, ix, var) + reset_debug_binding (id, var, &bindings); + + gsi_insert_seq_before_without_update (&gsi, bindings, GSI_SAME_STMT); +} + /* If STMT is a GIMPLE_CALL, replace it with its inline expansion. */ static bool @@ -4650,6 +4704,8 @@ expand_call_inline (basic_block bb, gimple stmt, copy_body_data *id) GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), bb, return_block, NULL); + reset_debug_bindings (id, stmt_gsi); + /* Reset the escaped solution. */ if (cfun->gimple_df) pt_solution_reset (&cfun->gimple_df->escaped);
This patch fixes a problem that has been with us for several years. Variable tracking has no idea about the end of the lifetime of inlined variables, so it keeps on computing locations for them over and over, even though the computed locations make no sense whatsoever because the variable can't even be accessed any more. With this patch, we unbind all inlined variables at the point the inlined function returns to, so that the locations for those variables will not be touched any further. In theory, we could do something similar to non-inlined auto variables, when they go out of scope, but their decls apply to the entire function and I'm told gdb sort-of expects the variables to be accessible throughout the function, so I'm not tackling that in this patch, for I'm happy enough with what this patch gets us: - almost 99% reduction in the output asm for the PR testcase - more than 90% reduction in the peak memory use compiling that testcase - 63% reduction in the compile time for that testcase What's scary is that the testcase is not particularly pathological. Any function that calls a longish sequence of inlined functions, that in turn call other inline functions, and so on, something that's not particularly unusual in C++, will likely observe significant improvement, as we won't see growing sequences of var_location notes after each call or so, as var-tracking computes a new in-stack location for the implicit this argument of each previously-inlined function. Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? Reset inlined debug variables at the end of the inlined function From: Alexandre Oliva <aoliva@redhat.com> for gcc/ChangeLog PR debug/58315 * tree-inline.c (reset_debug_binding): New. (reset_debug_bindings): Likewise. (expand_call_inline): Call it. --- gcc/tree-inline.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)