Message ID | orr4olzqwe.fsf@livre.localdomain |
---|---|
State | New |
Headers | show |
On Fri, Oct 26, 2012 at 04:30:41AM -0200, Alexandre Oliva wrote: > From: Alexandre Oliva <lxoliva@fsfla.org> > > for gcc/ChangeLog > > PR debug/54693 > * config/i386/i386.c (add_parameter_dependencies): Stop > backward scan at the insn before the incoming head. > (ix86_dependencies_evaluation_hook): Skip debug insns. Stop > if first_arg is head. Ok. > Stabilize loop-unroll empty latch check WRT debug insns > > From: Alexandre Oliva <lxoliva@fsfla.org> > > for gcc/ChangeLog > > PR debug/54693 > * loop-unroll.c (loop_exit_at_end_p): Skip debug insns. Ok, this is obvious. > Promote dead debug uses with immediate global replacement > > From: Alexandre Oliva <lxoliva@fsfla.org> > > for gcc/ChangeLog > > PR debug/54551 > PR debug/54693 > * valtrack.c (dead_debug_global_find): Accept NULL dtemp. > (dead_debug_global_insert): Return new entry. > (dead_debug_global_replace_temp): Return early if REG is no > longer in place, or if dtemp was already substituted. > (dead_debug_promote_uses): Insert for all defs and replace all > debug uses at once. > (dead_debug_local_finish): Release used after promotion. > (dead_debug_insert_temp): Stop if dtemp is NULL. Ok. > From: Alexandre Oliva <lxoliva@fsfla.org> > > for gcc/ChangeLog > > PR debug/54693 > * tree-ssa-threadedge.c (thread_around_empty_block): Copy > debug temps from predecessor before threading. > > for gcc/testsuite/ChangeLog > > PR debug/54693 > * gcc.dg/guality/pr54693.c: New. Ok with: > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/guality/pr54693.c > @@ -0,0 +1,30 @@ > +/* PR debug/54693 */ > +/* { dg-do run } */ > +/* { dg-options "-g" } */ > + > +#include <stdlib.h> > + The #include <stdlib.h> looks useless to me, please remove it and adjust gdb-test line accordingly. > --- a/gcc/tree-ssa-threadedge.c > +++ b/gcc/tree-ssa-threadedge.c > @@ -637,6 +637,24 @@ thread_around_empty_block (edge taken_edge, > if (!single_pred_p (bb)) > return NULL; > > + /* Before threading, copy DEBUG stmts from the predecessor, so that > + we don't lose the bindings as we redirect the edges. */ > + if (MAY_HAVE_DEBUG_STMTS) > + { > + gsi = gsi_after_labels (bb); > + for (gimple_stmt_iterator si = gsi_last_bb (taken_edge->src); > + !gsi_end_p (si); gsi_prev (&si)) > + { > + stmt = gsi_stmt (si); > + if (!is_gimple_debug (stmt)) > + continue; > + > + stmt = gimple_copy (stmt); > + /* ??? Should we drop the location of the copy? */ > + gsi_insert_before (&gsi, stmt, GSI_NEW_STMT); Don't we always ignore gimple_location on debug_stmts anyway? > Replace PHI nodes of dropped ivs with debug temps > > From: Alexandre Oliva <lxoliva@fsfla.org>, Jakub Jelinek <jakub@redhat.com> > > for gcc/ChangeLog > > PR debug/54693 > * tree-ssa-loop-ivopts.c (remove_unused_ivs): Emit debug temps > for dropped IV sets. I'd appreciate review of somebody knowledgeable about IVopts here. > + gimple def_temp = gimple_build_debug_bind (vexpr, comp, NULL); Just wonder whether the above is sufficient, comp might be somewhat larger expression than what we used to allow in debug bind stmts so far, where usually it would be something with about the same complexity as valid GIMPLE statements. I guess both tree-ssa-operands.c and cfgexpand.c handle it, so perhaps it is fine as is, other option would be to split the expression into smaller chunks using debug temporaries. I.e. instead of adding DEBUG D#123 => (sometype) (((someothertype) i_12 - 48) / 4) or whatever get_computation_at returns emit DEBUG D#124 => (someothertype) i_12 DEBUG D#125 => D#124 - 48 DEBUG D#126 => D#125 / 4 DEBUG D#123 => (sometype) D#126 If get_computation_at results are bound to say < 10 GIMPLE operations or something similar, it might be ok as is. > From: Alexandre Oliva <lxoliva@fsfla.org> > > for gcc/ChangeLog > > PR debug/54693 > * gcc/valtrack.c (dead_debug_insert_temp): Defer rescan of > newly-emitted debug insn. I guess ok, but please install it separately. Jakub
On 10/26/2012 12:30 AM, Alexandre Oliva wrote: > Propagate debug stmts for jump threading > > From: Alexandre Oliva<lxoliva@fsfla.org> > > for gcc/ChangeLog > > PR debug/54693 > * tree-ssa-threadedge.c (thread_around_empty_block): Copy > debug temps from predecessor before threading. So my only concerns here are what happens when we copy the debug statements, but end up not threading the jump. At this stage we're just recording the jump threading opportunities; only a subset of the registered jump threading opportunities will be optimized. So will these new debug statements cause problems if the jump isn't threaded? Closely related, do these new debug statements need to be rewritten into SSA form? If so we have a problem if we copy the debug statements, but ultimately thread no jumps at all. In that scenario we don't update the SSA graph. jeff
On Fri, Oct 26, 2012 at 11:42:20AM -0600, Jeff Law wrote: > On 10/26/2012 12:30 AM, Alexandre Oliva wrote: > >Propagate debug stmts for jump threading > >for gcc/ChangeLog > > > > PR debug/54693 > > * tree-ssa-threadedge.c (thread_around_empty_block): Copy > > debug temps from predecessor before threading. > So my only concerns here are what happens when we copy the debug > statements, but end up not threading the jump. At this stage we're > just recording the jump threading opportunities; only a subset of > the registered jump threading opportunities will be optimized. > > So will these new debug statements cause problems if the jump isn't > threaded? Closely related, do these new debug statements need to be > rewritten into SSA form? If so we have a problem if we copy the > debug statements, but ultimately thread no jumps at all. In that > scenario we don't update the SSA graph. The debug stmts are already in SSA form, and, being debug stmts, they are always just consumers of SSA names, not defining stmts, and Alex' patch is just copying them over from a predecessor bb to a successor bb that has a single predecessor (the one from which they are copied). So, all SSA names that are used in debug stmts in the predecessor must be still valid at the beginning of the successor bb. If the successor has any debug stmts on its own, they will go after these copied ones and thus override anything in them if they need to. Jakub
On 10/26/2012 11:51 AM, Jakub Jelinek wrote: > The debug stmts are already in SSA form, and, being debug stmts, they are > always just consumers of SSA names, not defining stmts, and Alex' patch > is just copying them over from a predecessor bb to a successor bb that > has a single predecessor (the one from which they are copied). > So, all SSA names that are used in debug stmts in the predecessor must > be still valid at the beginning of the successor bb. If the successor > has any debug stmts on its own, they will go after these copied ones > and thus override anything in them if they need to. OK. I wasn't aware they were strictly SSA_NAME consumers. In that case, then yes, we're fine. Patch approved. jeff
On Oct 26, 2012, Jakub Jelinek <jakub@redhat.com> wrote: > On Fri, Oct 26, 2012 at 04:30:41AM -0200, Alexandre Oliva wrote: >> From: Alexandre Oliva <lxoliva@fsfla.org> Fixed .git/config and the patches to use my @redhat.com address for these patches. >> +++ b/gcc/testsuite/gcc.dg/guality/pr54693.c > The #include <stdlib.h> looks useless to me, please remove it > and adjust gdb-test line accordingly. Thanks for catching this, it was a left-over from a failed experiment. > Don't we always ignore gimple_location on debug_stmts anyway? Right now, we mostly do, but they may become relevant once stmt frontier notes are implemented. >> + gimple def_temp = gimple_build_debug_bind (vexpr, comp, NULL); > Just wonder whether the above is sufficient It surely works. It may prevent common subexpression equivalences from being detected by VTA, but it also saves some temps. The exprs are of linear scaling, so they should be relatively small, but I can't guess whether there'd be any measurable benefit from splitting them up. Thanks for the reviews.
On Fri, Oct 26, 2012 at 8:30 AM, Alexandre Oliva <aoliva@redhat.com> wrote: > Both jump threading and loop induction variable optimizations were > dropping useful debug information, and it took improvements in both for > debug info about relevant variables in the enclosed testcase to survive > all the way to the end. > > The first problem was that jump threading could bypass blocks containing > debug stmts, losing the required bindings. The solution was to > propagate bypassed debug insns into the target of the bypass. Even if > the new confluence ends up introducing new PHI nodes, the propagated > debug binds will resolve to them, just as intended. This is implemented > in the 4th patch below: vta-jump-thread-prop-debug-pr54693.patch > > The second part of the problem was that, when performing loop ivopts, > we'd often drop PHI nodes and other SSA names for unused ivs. Although > we had code to propagate SSA DEFs into debug uses upon their removal, > this couldn't take care of PHI nodes (no debug phis or conditional debug > binds), and it was precisely at the removal of a PHI node that we > dropped debug info for the loop in the provided testcase. Once Jakub > figured out how to compute an unused iv out of available ivs (thanks!), > it was easy to introduce debug temps with the expressions to compute > them, so that debug binds would remain correct as long as the unused iv > can still be computed out of the others. (If it can't, we'll still try > propagation, but may end up losing at the PHI nodes). I had thought > that replacing only the PHI nodes would suffice, but it turned out that > replacing all unused iv defs with their equivalent used-IV expressions > got us better coverage, so this is what the 5th patch below does: > vta-ivopts-replace-dropped-phis-pr54693.patch + if (count > 1) + { + tree vexpr = make_node (DEBUG_EXPR_DECL); + DECL_ARTIFICIAL (vexpr) = 1; + TREE_TYPE (vexpr) = TREE_TYPE (comp); + if (SSA_NAME_VAR (def)) + DECL_MODE (vexpr) = DECL_MODE (SSA_NAME_VAR (def)); + else + DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (vexpr)); simply always use the TREE_TYPE path. TYPE_MODE is always valid for SSA_NAMEs. + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def) + { + if (!gimple_debug_bind_p (stmt)) + continue; + + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) + SET_USE (use_p, comp); + + if (!comp) + BREAK_FROM_IMM_USE_STMT (imm_iter); how does comp magically become NULL_TREE here? Btw, what's all the fuzz with IV candidates, etc? At least for non-PHIs I don't see why the regular release_ssa_name way of doing things does not work. IVOPTs is slow enough ... Richard. > > Regression testing revealed -fcompare-debug regressions exposed by these > patches. > > x86-specific code introduces pre-reload scheduling dependencies between > calls and likely-spilled parameter registers, but it does so in a way > that's AFAICT buggy, and fragile to the presence of debug insns at the > top of a block: we won't introduce a dependency for the first insn of > the block, even if we'd rather have such a dependency. This fails to > achieve the intended effect, and it also causes codegen differences when > the first insn in the block happens to be a debug insn, for then we will > add the intended dependency. The first patch below, > vta-stabilize-i386-call-args-sched-pr54693.patch, skips leading debug > insns, so as to remove the difference, and moves the end of the backward > scan to the insn before the first actual insn, so that we don't refrain > from considering it for dependencies. This in turn required an > additional test to make sure we wouldn't go past the nondebug head if > first_arg happened to be the head. > > The introduction of debug insns at new spots also exposed a bug in loop > unrolling: we'd unroll a loop a different number of times depending on > whether or not its latch is an empty block. The propagation or > introduction of debug insns in previously-empty latch blocks caused > loops to be unrolled a different number of times depending on the > presence of the debug insns, which triggered -fcompare-debug errors. > The fix was to count only nondebug insns towards the decision on whether > the latch block was empty. This is implemented in the second patch > below: vta-stabilize-loop-unroll-empty-latch-check-pr54693.patch > > Guality testsuite regressions given the patches above revealed that the > fast DCE global dead debug substitution introduced for PR54551 was not > correct: it was possible for us to visit, for the last time, a block > with a REG used in debug stmts after its death before we visited the > block with the debug use. As a result, we'd fail to emit a debug temp > at the not-revisited block, and the debug temp bindings introduced at > other blocks might be insufficient to get a value to the debug use > point, or even get an incorrect value there. I've fixed this problem by > using the DU chains at the time we realize a dead debug use uses a value > from a different block, adding at that moment debug temps at all defs of > the REG and replacing all debug uses with the debug temp, and arranging > that we don't do that again for the same REG. This ensures that, > regardless of the order in which we visit blocks, we'll get correct > debug temp bindings. This fix is implemented in the 3rd patch below: > vta-dce-globalize-debug-review-pr54551-pr54693.patch > > While looking into some crashes due to a bug in an earlier version of > the patch described above, I suspected the problem had to do with our DF > rescanning newly-emitted debug temps right away. I know SSA updating > messes with SSA def/use chains, and I suspected DF rescanning might > invalidate def/use chains as well. It turned out that the problem was > unrelated, but I kind of liked moving the initialization of the > to_rescan bitmap out of the loop over uses, and deferring the rescanning > of the new debug temp with it. However, since that's not a required > part of the proposed fixes, I split it out into a separate patch, the > 6th and last below: vta-valtrack-defer-debug-temp-rescan-pr54693.patch > > The patches were regstrapped together, on i686- and x86_64-linux-gnu, > and the only regression was guality/pr43051 on i686: a debug temp would > now be reset by the scheduler as it moved a def of a REG before a debug > temp that used the old value of the REG. I suppose we could improve > sched so as to try and emit a debug temp before the moved-ahead insn, > and then replace the REG with the debug temp in the debug use, instead > of resetting it, but since this is not exactly trivial, it's not clear > how much benefit it would bring and at what cost, and this patchset had > already been cooking for a while, I figured I'd stop at this point and > post the patchset with this caveat. > > Ok to install? > > > > > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist Red Hat Brazil Compiler Engineer >
On Tue, Oct 30, 2012 at 03:51:19PM +0100, Richard Biener wrote: > + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def) > + { > + if (!gimple_debug_bind_p (stmt)) > + continue; > + > + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) > + SET_USE (use_p, comp); > + > + if (!comp) > + BREAK_FROM_IMM_USE_STMT (imm_iter); > > how does comp magically become NULL_TREE here? Looks like pasto to me, from the first loop. BTW, I'd also think that the first loop should set count = 2 if the debug stmt already has a non-trivial expression (i.e. rhs isn't just the SSA_NAME), to force a debug temp in that case to avoid creating too large debug stmts. > Btw, what's all the fuzz with IV candidates, etc? At least for non-PHIs > I don't see why the regular release_ssa_name way of doing things does > not work. IVOPTs is slow enough ... Even if it is non-PHI, release_ssa_name will replace it with the definition, and then on another release_ssa_name again and again, and finally either be lucky enough that some SSA_NAME stays (is an IV that has been kept), but more often you'll just reach the PHI node and end up with a long list of: DEBUG D#7 => NULL DEBUG D#6 => D#7 DEBUG D#5 => D#6 DEBUG D#4 => D#5 DEBUG D#3 => D#4 DEBUG D#2 => D#3 DEBUG D#1 => D#2 (the NULL because of PHI). We don't need to find optimal IV replacement, so the code tries just a couple of them (perhaps the 64 should be a param, and perhaps could be lower by default), it just helps if the expression is smaller (smaller debug info), and if it contains as few SSA_NAMEs as possible (then it is more likely it will actually be useful). Jakub
On Oct 30, 2012, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Oct 30, 2012 at 03:51:19PM +0100, Richard Biener wrote: >> + if (!comp) >> + BREAK_FROM_IMM_USE_STMT (imm_iter); >> how does comp magically become NULL_TREE here? > Looks like pasto to me, from the first loop. More like a left-over from before I split the loop into two, but yeah, I'll take that out. > BTW, I'd also think that the first loop should set count = 2 if > the debug stmt already has a non-trivial expression Good! Will do.
Defer rescan of debug insns emitted for debug temps for dead debug uses From: Alexandre Oliva <lxoliva@fsfla.org> for gcc/ChangeLog PR debug/54693 * gcc/valtrack.c (dead_debug_insert_temp): Defer rescan of newly-emitted debug insn. --- gcc/valtrack.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/valtrack.c b/gcc/valtrack.c index f6c0db4..8cc3269 100644 --- a/gcc/valtrack.c +++ b/gcc/valtrack.c @@ -688,7 +688,9 @@ dead_debug_insert_temp (struct dead_debug_local *debug, unsigned int uregno, bind = emit_debug_insn_after (bind, insn); else bind = emit_debug_insn_before (bind, insn); - df_insn_rescan (bind); + if (debug->to_rescan == NULL) + debug->to_rescan = BITMAP_ALLOC (NULL); + bitmap_set_bit (debug->to_rescan, INSN_UID (bind)); /* Adjust all uses. */ while ((cur = uses)) @@ -699,8 +701,6 @@ dead_debug_insert_temp (struct dead_debug_local *debug, unsigned int uregno, *DF_REF_REAL_LOC (cur->use) = gen_lowpart_SUBREG (GET_MODE (*DF_REF_REAL_LOC (cur->use)), dval); /* ??? Should we simplify subreg of subreg? */ - if (debug->to_rescan == NULL) - debug->to_rescan = BITMAP_ALLOC (NULL); bitmap_set_bit (debug->to_rescan, INSN_UID (DF_REF_INSN (cur->use))); uses = cur->next; XDELETE (cur);