[llvm-dev] [DebugInfo] A value-tracking variable location update

Fri Nov 6 11:10:42 PST 2020

Awesome to read how it's coming along - I'm mostly aside from the
debug location work, but had just one or two clarifying questions

On Fri, Nov 6, 2020 at 10:27 AM Jeremy Morse
<jeremy.morse.llvm at gmail.com> wrote:
>
> Hi debug-info folks,
>
> Time for another update on the variable location "instruction referencing"
> implementation I've been doing, see this RFC [0, 1] for background. It's now at
> the point where I'd call it "done" (as far as software ever is), and so it's a
> good time to look at what results it produces. And here are the
> scores-on-the-doors using llvm-locstats, on clang-3.4 RelWithDebInfo first in
> "normal" mode and then with -Xclang -fexperimental-debug-variable-locations.
> "normal":
>
>  =================================================
>      cov%           samples         percentage(~)
>  -------------------------------------------------
>    0%               765406               22%
>    (0%,10%)          45179                1%
>    [10%,20%)         51699                1%
>    [20%,30%)         52044                1%
>    [30%,40%)         46905                1%
>    [40%,50%)         48292                1%
>    [50%,60%)         61342                1%
>    [60%,70%)         58315                1%
>    [70%,80%)         69848                2%
>    [80%,90%)         81937                2%
>    [90%,100%)       101384                2%
>    100%            2032034               59%
>  =================================================
>  -the number of debug variables processed: 3414385
>  -PC ranges covered: 61%
>  -------------------------------------------------
>  -total availability: 64%
>  =================================================
>
> With instruction referencing:
>
>  =================================================
>      cov%           samples         percentage(~)
>  -------------------------------------------------
>    0%               751201               21%
>    (0%,10%)          40708                1%
>    [10%,20%)         44909                1%
>    [20%,30%)         47544                1%
>    [30%,40%)         41630                1%
>    [40%,50%)         42742                1%
>    [50%,60%)         56692                1%
>    [60%,70%)         53796                1%
>    [70%,80%)         64476                1%
>    [80%,90%)         73836                2%
>    [90%,100%)        74423                2%
>    100%            2123749               62%
>  =================================================
>  -the number of debug variables processed: 3415706
>  -PC ranges covered: 68%
>  -------------------------------------------------
>  -total availability: 64%
>  =================================================
>
> The first observation: a significant increase in the byte-coverage statistic,
> meaning that we're able to track variable locations for longer and across more
> code. This was one of the main aims of this work, having better tracking of
> the locations that we know. The increase of seven percentage points includes an
> additional two percentage points of entry-value locations. If we disable entry
> value production then the scope-bytes-covered statistic moves from 59% to 64%,

Was this meant to be "from 64% to 59%"?
How does that compare to the baseline no-entry-value number?

Could you give a quick summary of the distinction between "PC ranges
covered" and "total availability"?

> which is still a decent improvement.
>
> The next observation is that the ``total availability'' of variables hasn't
> changed. This isn't the fully story -- if you give an absolute name to every
> variable with a location in the clang binary, there are 6949 dropped locations
> and 22564 completely new locations, meaning roughly 1% of all variables in the
> program have changed, it's just hidden by the statistics rounding. More detail
> on the nature of the changes are below. I was hoping for more false locations
> to be dropped; it's quite likely that there are many more false locations
> dropped within variables that have more than one value, which aren't readily
> reflected in these statistics.
>
> A natural question is: are all these new locations wrong, and the dropped
> locations only dropped because of bugs? To address that, I picked 20 new
> locations and 20 dropped locations at random and analysed why they happened.
> The input samples can be found here [2], along with an llvm-reduce'd version of
> each IR file. I confirmed the reason for the new/dropped location in the
> reduced and original file, as llvm-reducing them can alter the reason why
> something is dropped or not. Of the new locations, we previously could not
> track the location because:
>  * 14 DBG_VALUEs come after the vreg operand is out of liveness and are dropped
>    by LiveDebugVariables.
>  * 2 DBG_VALUEs are out of liveness and dropped by RegisterCoalescing
>    out of conservativeness.
>  * 2 DBG_VALUEs that appear before their operand is defined. This is out of
>    liveness, instruction referencing saves them through preserving debug
>    use-before-defs.
>  * 2 DBG_VALUEs that are out of liveness after a branch, but the value is live
>    down the other branch path.
>
> All of these locations can be tracked with instruction referencing because
> liveness is not a consideration, only availability in physical registers. 19 of
> the new locations were correct, while one tracked the right value but picked
> the wrong location for it, which I've now got a patch for.
>
> For the dropped locations:
>  * 8 false locations are dropped, they used to refer to the wrong value because
>    of a failure in register coalescing, see the body of [3].

Would these issues ^ show up/be testable with Dexter?

>  * 3 locations are un-necessarily dropped when different subregisters are
>    merged together in register coalescing.
>  * 3 locations are un-necessarily dropped due to conservative tracking of PHI
>    values (the code in D86814, can be fixed with more C++).
>  * 2 of the sample didn't actually have a dropped location; instead they
>    preserved an undef debug instruction in early-taildup, and my scripts picked
>    this up as dropping a location.
>  * 2 locations aren't tracked by InstrRefBasedLDV through a block that's
>    out of scope, meaning the location never covers instructions that are in
>    scope. VarLocBasedLDV is vulerable to this too, but MachineSink can drop a
>    DBG_VALUE on the far side of the scope gap, saving the location. See
>    "Limitations" below.
>  * 2 locations dropped during tail duplication: one in early-taildup which
>    I haven't tried to address yet (see "Limitations"), one in late taildup
>    where a block containing only debug instructions isn't correctly duplicated.
>
> To summarise: all the new locations found were correct and not trackable by
> DBG_VALUE variable-location tracking, although there are some bugs in picking
> locations. Roughly half of the dropped locations are actual false locations,
> the other half are due to unimplemented or limited handling of optimisations in
> the instruction referencing code so far.
>
> This pretty much fufils the objective of this work: we're able to save a lot
> more variable locations through the register allocator because we don't have to
> be so conservative about liveness. Plus, the default behaviour of all
> optimisations now is to _drop_ a variable location, as opposed to the existing
> situation where after we leave SSA form, all bets are off.
>
> Another question is how much this costs in compile time: a clang-3.4 build
> using instruction referencing on my otherwise idle machine usually tracks
> within 2% of a normal build. This is IMO expected given the larger amount of
> debugging information being produced, and I haven't closely studied the
> performance of a whole build using instruction referencing yet, so it'll
> probably get better. A more recent change to InstrRefBasedLDV has added a big
> slowdown though, so I'm going to skip reporting any performance results for
> now.
>
> Current situation
> =================
>
> Some of this work has landed; I've got some patches up for review [4] that
> implement the core parts. I also have a long tail of tweaks and
> location-salvaging in a tree here [5] which just fleshes outs more optimisation
> passes and installs bugfixes. (Commits there are not written to be human
> consumable, alas). There are no fatal flaws in the design as far as I'm aware,
> although there are some annoyances (see "Limitations").
>
> The biggest problem is that this all relies on a new LiveDebugValues
> implementation that doesn't have sufficient test coverage yet, and is still
> Somewhat Experimental (TM). Given the number of times an unpleasant performance
> cliff has been found in VarLoc LiveDebugValues, it wants a long time to soak in
> before being deployed.
>
> Limitations
> ===========
>
> Here's a non-exhaustive list of known problems. None of them are fatal IMO,
> and have a small effect on variable availability:
>  * Early tail duplication: like late tail duplication, this tears apart SSA
>    information and can cause the same "Value" to be defined twice. This is
>    solvable using the SSAUpdater utility, which early-taildup already uses.
>  * Attaching a debug instruction number to a COPY instruction is highly
>    undesirable because the COPY doesn't actually define a value, it just moves
>    it between locations. At least one optimisation (X86 LEAtoMOV) transforms
>    instructions into COPYs (LEA $rsp + 0 => COPY $rsp), which is unfortunate.
>    This doesn't happen a lot though, and can be fixed by dropping a DBG_PHI
>    of the COPYd register nearby. Plus it only happens post-regalloc, which
>    makes it less of a problem.
>  * Trivial def rematerialization: there's no pattern to rely on in how the
>    register allocator rematerializes values, and so values can rematerialize
>    in different registers dominating different parts of the CFG. It's hard to
>    track the variable location after that, because it has multiple values in
>    the eyes of InstrRefBasedLDV. My preference would be, seeing how these defs
>    are effectively constants, to have the target describe such trivial defs
>    in a DIExpression. That avoids having to track the location of a constant
>    that we already know.
>  * As mentioned in the "missing" variable locations list, gaps in lexical
>    scopes can lead to locations not being propagated sufficiently far, a
>    problem for both variable-location tracking solutions as documented in
>    PR48091. However, using DBG_VALUEs to track variable locations can save a
>    few of them because MachineSink can sink DBG_VALUEs over the scope gap,
>    wheras instruction-referencing tries to rely on tracking debug
>    use-before-defs which don't propagate across scope gaps. More on how to
>    resolve this in PR48091.
>
> Next Steps
> ==========
>
> While this isn't ready for general use yet, it'd be great to get as much as
> possible into llvm-12 behind the -Xclang
> -fexperimental-debug-variable-locations flag. That eases the path to testing
> for consumers, which gives a greater chance of finding worst-case slowdowns in
> advance of instruction referencing being generally available.
>
> There's a decent amount of stuff under "Limitations" above that I can address,
> plus some performance profiling is still needed. I imagine the next best thing
> to do is add support for GlobalISel and some non-X86 backends (certain
> TargetInstrInfo hooks need to perform debug-info bookkeeping), which would make
> this all more appetising.
>
> [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
> [1] http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
> [2] https://github.com/jmorse/llvm-inst-ref-test-samples
> [3] https://reviews.llvm.org/D86813
> [4] https://reviews.llvm.org/D88898
> [5] https://github.com/jmorse/llvm-project/commit/0a702b967927d888bd222806252783359fc74d57
>
> --
> Thanks,
> Jeremy