[llvm-dev] [DebugInfo] A value-tracking variable location update

Fri Nov 6 10:26:51 PST 2020

Hi debug-info folks,

Time for another update on the variable location "instruction referencing"
implementation I've been doing, see this RFC [0, 1] for background. It's now at
the point where I'd call it "done" (as far as software ever is), and so it's a
good time to look at what results it produces. And here are the
scores-on-the-doors using llvm-locstats, on clang-3.4 RelWithDebInfo first in
"normal" mode and then with -Xclang -fexperimental-debug-variable-locations.
"normal":

 =================================================
     cov%           samples         percentage(~)
 -------------------------------------------------
   0%               765406               22%
   (0%,10%)          45179                1%
   [10%,20%)         51699                1%
   [20%,30%)         52044                1%
   [30%,40%)         46905                1%
   [40%,50%)         48292                1%
   [50%,60%)         61342                1%
   [60%,70%)         58315                1%
   [70%,80%)         69848                2%
   [80%,90%)         81937                2%
   [90%,100%)       101384                2%
   100%            2032034               59%
 =================================================
 -the number of debug variables processed: 3414385
 -PC ranges covered: 61%
 -------------------------------------------------
 -total availability: 64%
 =================================================

With instruction referencing:

 =================================================
     cov%           samples         percentage(~)
 -------------------------------------------------
   0%               751201               21%
   (0%,10%)          40708                1%
   [10%,20%)         44909                1%
   [20%,30%)         47544                1%
   [30%,40%)         41630                1%
   [40%,50%)         42742                1%
   [50%,60%)         56692                1%
   [60%,70%)         53796                1%
   [70%,80%)         64476                1%
   [80%,90%)         73836                2%
   [90%,100%)        74423                2%
   100%            2123749               62%
 =================================================
 -the number of debug variables processed: 3415706
 -PC ranges covered: 68%
 -------------------------------------------------
 -total availability: 64%
 =================================================

The first observation: a significant increase in the byte-coverage statistic,
meaning that we're able to track variable locations for longer and across more
code. This was one of the main aims of this work, having better tracking of
the locations that we know. The increase of seven percentage points includes an
additional two percentage points of entry-value locations. If we disable entry
value production then the scope-bytes-covered statistic moves from 59% to 64%,
which is still a decent improvement.

The next observation is that the ``total availability'' of variables hasn't
changed. This isn't the fully story -- if you give an absolute name to every
variable with a location in the clang binary, there are 6949 dropped locations
and 22564 completely new locations, meaning roughly 1% of all variables in the
program have changed, it's just hidden by the statistics rounding. More detail
on the nature of the changes are below. I was hoping for more false locations
to be dropped; it's quite likely that there are many more false locations
dropped within variables that have more than one value, which aren't readily
reflected in these statistics.

A natural question is: are all these new locations wrong, and the dropped
locations only dropped because of bugs? To address that, I picked 20 new
locations and 20 dropped locations at random and analysed why they happened.
The input samples can be found here [2], along with an llvm-reduce'd version of
each IR file. I confirmed the reason for the new/dropped location in the
reduced and original file, as llvm-reducing them can alter the reason why
something is dropped or not. Of the new locations, we previously could not
track the location because:
 * 14 DBG_VALUEs come after the vreg operand is out of liveness and are dropped
   by LiveDebugVariables.
 * 2 DBG_VALUEs are out of liveness and dropped by RegisterCoalescing
   out of conservativeness.
 * 2 DBG_VALUEs that appear before their operand is defined. This is out of
   liveness, instruction referencing saves them through preserving debug
   use-before-defs.
 * 2 DBG_VALUEs that are out of liveness after a branch, but the value is live
   down the other branch path.

All of these locations can be tracked with instruction referencing because
liveness is not a consideration, only availability in physical registers. 19 of
the new locations were correct, while one tracked the right value but picked
the wrong location for it, which I've now got a patch for.

For the dropped locations:
 * 8 false locations are dropped, they used to refer to the wrong value because
   of a failure in register coalescing, see the body of [3].
 * 3 locations are un-necessarily dropped when different subregisters are
   merged together in register coalescing.
 * 3 locations are un-necessarily dropped due to conservative tracking of PHI
   values (the code in D86814, can be fixed with more C++).
 * 2 of the sample didn't actually have a dropped location; instead they
   preserved an undef debug instruction in early-taildup, and my scripts picked
   this up as dropping a location.
 * 2 locations aren't tracked by InstrRefBasedLDV through a block that's
   out of scope, meaning the location never covers instructions that are in
   scope. VarLocBasedLDV is vulerable to this too, but MachineSink can drop a
   DBG_VALUE on the far side of the scope gap, saving the location. See
   "Limitations" below.
 * 2 locations dropped during tail duplication: one in early-taildup which
   I haven't tried to address yet (see "Limitations"), one in late taildup
   where a block containing only debug instructions isn't correctly duplicated.

To summarise: all the new locations found were correct and not trackable by
DBG_VALUE variable-location tracking, although there are some bugs in picking
locations. Roughly half of the dropped locations are actual false locations,
the other half are due to unimplemented or limited handling of optimisations in
the instruction referencing code so far.

This pretty much fufils the objective of this work: we're able to save a lot
more variable locations through the register allocator because we don't have to
be so conservative about liveness. Plus, the default behaviour of all
optimisations now is to _drop_ a variable location, as opposed to the existing
situation where after we leave SSA form, all bets are off.

Another question is how much this costs in compile time: a clang-3.4 build
using instruction referencing on my otherwise idle machine usually tracks
within 2% of a normal build. This is IMO expected given the larger amount of
debugging information being produced, and I haven't closely studied the
performance of a whole build using instruction referencing yet, so it'll
probably get better. A more recent change to InstrRefBasedLDV has added a big
slowdown though, so I'm going to skip reporting any performance results for
now.

Current situation
=================

Some of this work has landed; I've got some patches up for review [4] that
implement the core parts. I also have a long tail of tweaks and
location-salvaging in a tree here [5] which just fleshes outs more optimisation
passes and installs bugfixes. (Commits there are not written to be human
consumable, alas). There are no fatal flaws in the design as far as I'm aware,
although there are some annoyances (see "Limitations").

The biggest problem is that this all relies on a new LiveDebugValues
implementation that doesn't have sufficient test coverage yet, and is still
Somewhat Experimental (TM). Given the number of times an unpleasant performance
cliff has been found in VarLoc LiveDebugValues, it wants a long time to soak in
before being deployed.

Limitations
===========

Here's a non-exhaustive list of known problems. None of them are fatal IMO,
and have a small effect on variable availability:
 * Early tail duplication: like late tail duplication, this tears apart SSA
   information and can cause the same "Value" to be defined twice. This is
   solvable using the SSAUpdater utility, which early-taildup already uses.
 * Attaching a debug instruction number to a COPY instruction is highly
   undesirable because the COPY doesn't actually define a value, it just moves
   it between locations. At least one optimisation (X86 LEAtoMOV) transforms
   instructions into COPYs (LEA $rsp + 0 => COPY $rsp), which is unfortunate.
   This doesn't happen a lot though, and can be fixed by dropping a DBG_PHI
   of the COPYd register nearby. Plus it only happens post-regalloc, which
   makes it less of a problem.
 * Trivial def rematerialization: there's no pattern to rely on in how the
   register allocator rematerializes values, and so values can rematerialize
   in different registers dominating different parts of the CFG. It's hard to
   track the variable location after that, because it has multiple values in
   the eyes of InstrRefBasedLDV. My preference would be, seeing how these defs
   are effectively constants, to have the target describe such trivial defs
   in a DIExpression. That avoids having to track the location of a constant
   that we already know.
 * As mentioned in the "missing" variable locations list, gaps in lexical
   scopes can lead to locations not being propagated sufficiently far, a
   problem for both variable-location tracking solutions as documented in
   PR48091. However, using DBG_VALUEs to track variable locations can save a
   few of them because MachineSink can sink DBG_VALUEs over the scope gap,
   wheras instruction-referencing tries to rely on tracking debug
   use-before-defs which don't propagate across scope gaps. More on how to
   resolve this in PR48091.

Next Steps
==========

While this isn't ready for general use yet, it'd be great to get as much as
possible into llvm-12 behind the -Xclang
-fexperimental-debug-variable-locations flag. That eases the path to testing
for consumers, which gives a greater chance of finding worst-case slowdowns in
advance of instruction referencing being generally available.

There's a decent amount of stuff under "Limitations" above that I can address,
plus some performance profiling is still needed. I imagine the next best thing
to do is add support for GlobalISel and some non-X86 backends (certain
TargetInstrInfo hooks need to perform debug-info bookkeeping), which would make
this all more appetising.

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
[1] http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
[2] https://github.com/jmorse/llvm-inst-ref-test-samples
[3] https://reviews.llvm.org/D86813
[4] https://reviews.llvm.org/D88898
[5] https://github.com/jmorse/llvm-project/commit/0a702b967927d888bd222806252783359fc74d57

--
Thanks,
Jeremy