[PR54693] loss of debug info in jump threading and loop ivopts

Both jump threading and loop induction variable optimizations were
dropping useful debug information, and it took improvements in both for
debug info about relevant variables in the enclosed testcase to survive
all the way to the end.

The first problem was that jump threading could bypass blocks containing
debug stmts, losing the required bindings.  The solution was to
propagate bypassed debug insns into the target of the bypass.  Even if
the new confluence ends up introducing new PHI nodes, the propagated
debug binds will resolve to them, just as intended.  This is implemented
in the 4th patch below: vta-jump-thread-prop-debug-pr54693.patch

The second part of the problem was that, when performing loop ivopts,
we'd often drop PHI nodes and other SSA names for unused ivs.  Although
we had code to propagate SSA DEFs into debug uses upon their removal,
this couldn't take care of PHI nodes (no debug phis or conditional debug
binds), and it was precisely at the removal of a PHI node that we
dropped debug info for the loop in the provided testcase.  Once Jakub
figured out how to compute an unused iv out of available ivs (thanks!),
it was easy to introduce debug temps with the expressions to compute
them, so that debug binds would remain correct as long as the unused iv
can still be computed out of the others.  (If it can't, we'll still try
propagation, but may end up losing at the PHI nodes).  I had thought
that replacing only the PHI nodes would suffice, but it turned out that
replacing all unused iv defs with their equivalent used-IV expressions
got us better coverage, so this is what the 5th patch below does:
vta-ivopts-replace-dropped-phis-pr54693.patch

Regression testing revealed -fcompare-debug regressions exposed by these
patches.

x86-specific code introduces pre-reload scheduling dependencies between
calls and likely-spilled parameter registers, but it does so in a way
that's AFAICT buggy, and fragile to the presence of debug insns at the
top of a block: we won't introduce a dependency for the first insn of
the block, even if we'd rather have such a dependency.  This fails to
achieve the intended effect, and it also causes codegen differences when
the first insn in the block happens to be a debug insn, for then we will
add the intended dependency.  The first patch below,
vta-stabilize-i386-call-args-sched-pr54693.patch, skips leading debug
insns, so as to remove the difference, and moves the end of the backward
scan to the insn before the first actual insn, so that we don't refrain
from considering it for dependencies.  This in turn required an
additional test to make sure we wouldn't go past the nondebug head if
first_arg happened to be the head.

The introduction of debug insns at new spots also exposed a bug in loop
unrolling: we'd unroll a loop a different number of times depending on
whether or not its latch is an empty block.  The propagation or
introduction of debug insns in previously-empty latch blocks caused
loops to be unrolled a different number of times depending on the
presence of the debug insns, which triggered -fcompare-debug errors.
The fix was to count only nondebug insns towards the decision on whether
the latch block was empty.  This is implemented in the second patch
below: vta-stabilize-loop-unroll-empty-latch-check-pr54693.patch

Guality testsuite regressions given the patches above revealed that the
fast DCE global dead debug substitution introduced for PR54551 was not
correct: it was possible for us to visit, for the last time, a block
with a REG used in debug stmts after its death before we visited the
block with the debug use.  As a result, we'd fail to emit a debug temp
at the not-revisited block, and the debug temp bindings introduced at
other blocks might be insufficient to get a value to the debug use
point, or even get an incorrect value there.  I've fixed this problem by
using the DU chains at the time we realize a dead debug use uses a value
from a different block, adding at that moment debug temps at all defs of
the REG and replacing all debug uses with the debug temp, and arranging
that we don't do that again for the same REG.  This ensures that,
regardless of the order in which we visit blocks, we'll get correct
debug temp bindings.  This fix is implemented in the 3rd patch below:
vta-dce-globalize-debug-review-pr54551-pr54693.patch

While looking into some crashes due to a bug in an earlier version of
the patch described above, I suspected the problem had to do with our DF
rescanning newly-emitted debug temps right away.  I know SSA updating
messes with SSA def/use chains, and I suspected DF rescanning might
invalidate def/use chains as well.  It turned out that the problem was
unrelated, but I kind of liked moving the initialization of the
to_rescan bitmap out of the loop over uses, and deferring the rescanning
of the new debug temp with it.  However, since that's not a required
part of the proposed fixes, I split it out into a separate patch, the
6th and last below: vta-valtrack-defer-debug-temp-rescan-pr54693.patch

The patches were regstrapped together, on i686- and x86_64-linux-gnu,
and the only regression was guality/pr43051 on i686: a debug temp would
now be reset by the scheduler as it moved a def of a REG before a debug
temp that used the old value of the REG.  I suppose we could improve
sched so as to try and emit a debug temp before the moved-ahead insn,
and then replace the REG with the debug temp in the debug use, instead
of resetting it, but since this is not exactly trivial, it's not clear
how much benefit it would bring and at what cost, and this patchset had
already been cooking for a while, I figured I'd stop at this point and
post the patchset with this caveat.

Ok to install?

[PR54693] loss of debug info in jump threading and loop ivopts

Commit Message

Comments

Patch