<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Thanks for sharing this RFC, Jeremy.<div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On Jun 18, 2020, at 8:35 AM, Jeremy Morse <<a href="mailto:jeremy.morse.llvm@gmail.com" class="">jeremy.morse.llvm@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Hi debuginfo-cabal,<br class=""><br class="">tl;dr: Let's please consider using a new implementation of LiveDebugValues<br class="">that produces richer information, might be slightly faster, but mostly will<br class="">support the instruction referencing and value tracking paradigm from my RFC [0]<br class="">rather than the location tracking that LiveDebugValues does today.<br class=""><br class="">In that RFC, the main motivator is to treat variable locations a bit more like<br class="">SSA, having variable location metadata refer to the instructions that define a<br class="">value, and track the value moving around registers from there. This<br class="">prototype [1] is an implementation of the "track" mechanism, where we produce<br class="">value numbers for every 'def' in a function, and track them being moved around.<br class="">Variable locations are then resolved and extended by following where the value<br class="">numbers go.<br class=""></div></div></blockquote><div><br class=""></div><div>Your earlier RFC sketches a plan for extending LiveDebugValues to handle instruction-referencing DBG_VALUEs. From <a href="http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html" class="">http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html</a> --</div><div><br class=""></div><div></div><blockquote type="cite" class=""><blockquote type="cite" class=""><div>How then do we translate this new kind of machine location into</div><div class="">DWARF/CodeView variable locations, which need to know which register</div><div class="">to look in? The answer is: LiveDebugValues [3]. We already perform a</div><div class="">dataflow analysis in LiveDebugValues of where values "go" after</div><div class="">they're defined: we can use that to take "defining instructions" and</div><div class="">determine register / stack locations. We would need to track values</div><div class="">from the defining instruction up to the DBG_VALUE where that value</div><div class="">becomes a variable location, after which it's no different from the</div><div class="">LiveDebugValues analysis that we perform today.</div></blockquote></blockquote><div><br class=""></div><div>I had interpreted this as meaning: introduce transfer functions to walk from the referenced instruction until we reach the referencing DBG_VALUE, then create a VarLoc. IIUC, at that point the unmodified LiveDebugValues algorithm could proceed as usual.</div><div><br class=""></div><div>Reading this RFC, I get the sense that that has several drawbacks: it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUE (all the ‘ugly phi stuff’ is missing!), and it doesn’t track all the locations for a variable. Is that correct?</div><br class=""><blockquote type="cite" class=""><div class=""><div class="">This matches, to an extent, the "Register Data Flow" / RDF graph analysis<br class="">that recently moved from the Hexagon backend to CodeGen. I had a look, and<br class="">it appears RDF computes much richer information than LiveDebugValues needs<br class="">(specifically, all uses of values) and allocates for each value number, which<br class="">is quite a bit of overhead. I decided it'd be simpler to make a new<br class="">LiveDebugValues implementation, as we've had performance problems with<br class="">LiveDebugValues in the past, and we don't need to track additional metadata<br class="">for each value, only number them uniquely.<br class=""><br class="">There are few independent reasons to adopt this new implementation, and I'm<br class="">self-conscious about dropping a code-bomb on people. My questions for this<br class="">email are:<br class=""> * Does the new implementation (described below) make sense, and<br class=""></div></div></blockquote><div><br class=""></div><div>I think your results speak for themselves :).</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""> * If so, how would I go about landing this?</div></div></blockquote><div><br class=""></div><div>One thing I’m not yet clear on: does your prototype essentially implement the minimal extension to LiveDebugValues needed to handle instruction-referencing DBG_VALUEs? If not, should it?, or should we take the opportunity to remove limitations from the current LiveDebugValues algorithm?</div><div><br class=""></div><div>As for how to land this, I propose:</div><div><br class=""></div><div>- Move LiveDebugValues.cpp to lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp</div><div>- Split out common transfer functions and other helpers into lib/CodeGen/LiveDebugValues/Common.{h,cpp}</div><div>- Finally, add lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp</div><div>- Add a cl::opt like -instr-ref-dbg-values to allow selecting a DBG_VALUE flavor</div><div><br class=""></div><blockquote type="cite" class=""><div class=""><div class="">We could wait until there's a complete suite of patches for the instruction<br class="">referencing work (I've got a working prototype, but not production-quality).</div></div></blockquote><blockquote type="cite" class=""><div class=""><div class="">On the other hand, this new LiveDebugValues implementation is probably ready<br class="">for review now, works independently, and is very much the hardest part of<br class="">instruction referencing, so it'd be nice to get it under-way earlier. My<br class="">preferred outcome would be adding it as a parallel implementation to the<br class="">current LiveDebugValues, enabled by a runtime switch, then start to land<br class="">the rest of the instruction referencing work around it. This reduces the<br class="">scope for further bit-rot, and as Vedant mentioned [2] this would all need<br class="">to sit under a runtime-switch for a transition period anyway.<br class=""></div></div></blockquote><div><br class=""></div><div>Yes, I’m in favor of landing work related to instruction-referencing DBG_VALUEs in small-ish independent patches, as they become ready. I’d very much like to keep the unmodified var-loc DBG_VALUE code paths intact and separated.</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""># Overview<br class=""><br class="">Today's LiveDebugValues (explained in the file docu-comment [3]) concerns<br class="">itself with variable /locations/ only, not the values that they contain. The<br class="">implementation is very good at its job: it detects when a location is<br class="">invalidated (i.e. clobbered), control flow joins are a simple matter of "Do<br class="">the predecessors agree on the location?", and not much more work is needed.<br class=""><br class="">Switching to tracking variable values is more complicated. Following the<br class="">original proposal, for this pseudo-MIR code sequence:<br class=""><br class="">    $rax = somedef<br class="">    DBG_INSTR_REF reference-to-somedef<br class="">    $rbx = COPY killed $rax<br class="">    ; Variable location is $rax, then moves to $rbx<br class=""><br class="">We need to track that:<br class=""> a) $rax contains a value defined by the "somedef" instruction,<br class=""> b) The DBG_INSTR_REF needs a location for the value somedef defines.<br class=""> c) That value from "somedef" is copied to rbx,<br class=""><br class="">The new LiveDebugValues implementation doesn't have DBG_INSTR_REF yet, so<br class="">I've written it to work with DBG_VALUEs too, to prove that it works just as<br class="">well as the current implementation. It deals with code sequences such as:<br class=""><br class="">    $rax = somedef<br class="">    DBG_VALUE $rax<br class="">    $rbx = COPY killed $rax<br class="">    ; Variable location is $rax, then moves to $rbx<br class=""><br class="">In which the same tracking has to occur, but the DBG_VALUE "reads" a value<br class="">number from $rax. As the variable locations are determined by the value number<br class="">that the variable has, it doesn't matter whether that was specified by a<br class="">DBG_VALUE or DBG_INSTR_REF.  The tracking is sufficiently good to produce an<br class="">identical binary to the output of current LiveDebugValues (details below).<br class=""><br class="">There are some unrelated benefits to consider: because the new implementation<br class="">tracks all locations that a value may be in, we can handle the following<br class="">problem, that the current implementation cannot:<br class=""><br class="">  entry:<br class="">    $rax = somedef<br class="">    DBG_VALUE $rax<br class="">    SPILL_TO_STACK %somestackslot, $rax<br class="">  loop:<br class="">    [... some code...]<br class="">    $rax = RESTORE_FROM_STACK %somestackslot<br class="">    do_things_with_rax<br class="">    brcc loop<br class="">  exit:<br class="">    ret<br class=""><br class="">Here, there's a variable spilt to the stack, that's temporarily restored<br class="">in the loop. At the end of the "loop" block, the value has two locations:<br class="">$rax and the stack slot. The current LiveDebugValues always follows the<br class="">restore from the stack: it decides the variable is on the stack at the end<br class="">of "entry", and in $rax at the end of "loop". Because these locations don't<br class="">agree on entry to "loop", the location in "loop" is dropped completely.<br class="">Having LiveDebugValues ignore or prefer the restore just changes what kind of<br class="">circumstances locations get dropped in: the underlying problem is that the<br class="">value can be in multiple places, but we can track only one of those places.<br class=""><br class="">The new implementation has enough information to solve this, as it tracks all<br class="">values in all machine locations. Unfortunately it doesn't make much difference<br class="">to the location statistics for clang RelWithDebInfo builds. An additional<br class="">benefit is that we would be better placed to produce saner locations lists [4].<br class=""><br class="">In terms of performance, the new implementation takes on average just as long<br class="">as the current implementation. A full stage2 build of clang may be 0.1% slower,<br class="">but it's in the noise on my machine. All the algorithms are much more localised<br class="">than the current implementation (see below), and I reckon there's still scope<br class="">for further optimisation as it's easier to make decisions such as "Don't try to<br class="">join these variable locations in this block, they're immediately re-stated by<br class="">DBG_VALUEs".<br class=""><br class=""># Implementation<br class=""><br class="">The new implementation strictly does more work than the old one: it tracks every<br class="">value in the function, not just those that are associated with a variable<br class="">location. It also has to detect places where control flow merges and produces<br class="">PHI values, where the old implementation could just check that the live-out<br class="">variable location in predecessors agreed. To address this, I've separated the<br class="">implementation into two smaller problems, that can be solved almost<br class="">independently and with much greater locality.<br class=""><br class="">Both of the problems below are sort-of-like SSA construction (they have a set<br class="">of defs, and a large set of uses), but instead of calculating new PHI<br class="">placements, we have to infer where PHIs _were_ placed.<br class=""><br class="">Problem 1: Machine value numbering. Every def in the function is given a<br class="">number, which is then condensed into a transfer function for each block,<br class="">specifying which registers (or spill locs) contain which values. It's<br class="">genuinely just a map of machine location to value. Using the same sort of<br class="">dataflow model that LiveDebugValues currently uses, values are propagated to<br class="">compute the live-ins and live-out values of each block, with one significant<br class="">modification: we assign a new value number when control flow merges different<br class="">values, essentially re-computing PHI nodes. This leads to some ungraceful<br class="">code: see "Ugly PHI stuff" below. This problem is kept localised by only<br class="">considering the registers and spill slots that are accessed, making the<br class="">problem size proportionate to the product of the max working set of the<br class="">function and the number of blocks. The number of variables used is irrelevant.<br class=""><br class="">Problem 2: Variable value assignment. By using the solution for the first<br class="">problem, of "what value number does each machine location contain" at each<br class="">program point, every DBG_VALUE can be interpreted as assigning a value to a<br class="">variable, not a location. This gets turned into a transfer function too, and<br class="">we can then perform another dataflow analysis, propagating what _value_ a<br class="">variable has in a block, and computing the set of live-in variable values for<br class="">each block. A specific machine location can be picked after this problem is<br class="">solved, possibly from several options if multiple locations have the same<br class="">value. This isn't a completely clean abstraction: predecessors with different<br class="">variable values that are in the same location should be propagated, but the<br class="">new implementation has to find and validate a PHI value number that represents<br class="">those merged values (see "More ugly PHI stuff" below).<br class=""><br class="">I'm keeping problem 2 localised by propagating variable values for one lexical<br class="">scope at a time, and only over the blocks that are in that scope. As well as<br class="">localising by turning the main problem into a large number of small problems,<br class="">this reduces the amount of work too. The current implementation filters out<br class="">variable locations that are out of lexical scope every time a block is<br class="">processed, which now no longer needs to happen. Variables in the outermost<br class="">scopes won't be re-processed when LiveDebugValues goes around an inner-scope<br class="">loop several times; likewise variables in inner scopes won't get re-analysed<br class="">when an outer scope variable needs another pass through the function. This also<br class="">means that all lexical scopes that cover a single block (or zero) require no<br class="">dataflow analysis, which should attenuate the effects of heavy inlining a bit.<br class=""><br class="">The solution to these two problems (block live-in variable values and block<br class="">live-in machine location values) can then be combined in a final step through<br class="">the function, producing DBG_VALUEs for live-ins and DBG_VALUEs for any<br class="">location transfers within a block.<br class=""><br class=""># Testing and validation<br class=""><br class="">The new LiveDebugValues implementation produces an identical AMD64 clang-3.4<br class="">binary to the old implementation, based on d481e59863a (Feb 2020). Well,<br class="">almost -- it proved too difficult to completely emulate the old version,<br class="">instead with the following changes [5] applied to d481e59863a, the two<br class="">implementations produce an identical binary:<br class=""> * Applied D67500 [6],<br class=""> * Ignore identity copies,<br class=""> * Force any newly created DBG_VALUEs to be inserted according to the order<br class="">   in which DebugVariables appear in the input function. So that they're<br class="">   consistent between the two implementations.<br class=""> * Disabling all tracking of stack spills.<br class=""><br class="">Ignoring the final point, I think this would demonstrate that the two<br class="">implementations are equivalent. Stack spills are tricky: because value<br class="">transfers within blocks are left until the final step of the new<br class="">implementation, it's hard to work out in advance when old LiveDebugValues<br class="">would have transferred a location. It was easier to just compare binaries<br class="">where spills hadn't been followed.<br class=""><br class="">IMO, the non-spill (the vast majority) locations being identical should be<br class="">sufficient to show that the overall algorithm works, and dealing with spills<br class="">is a minor extension.<br class=""><br class="">The tests pass, except for the ones covered by the limitations section, and<br class="">a few DBG_VALUE-change-order issues.<br class=""><br class=""># Limitations and unimplemented features<br class=""><br class="">There are a couple of things that I haven't gotten around to implementing<br class="">yet, for which the tests fail:<br class=""><br class="">Overlapping fragments: this is trivial to do, but at the back of my job queue,<br class=""><br class="">Entry values: much more interesting. As Djordje has pointed out elsewhere,<br class="">LiveDebugValues has been assuming that every DBG_VALUE assigns a new value<br class="">to a variable [7], which means if a DBG_VALUE re-states a variable location<br class="">when it doesn't have to, we risk dropping an entry value. This will be much<br class="">easier in this new model (I haven't had the time to do it yet), because the<br class="">machine value numbering means that each entry value has a specific number,<br class="">and we can easily identify when it's no longer available. This should just<br class="">be a matter of the in-block transfers tracking code identifying that a<br class="">variable should have the entry value; it not being any machine location<br class="">currently; and issuing an appropriate DBG_VALUE.<br class=""><br class=""># Ugly PHI stuff<br class=""><br class="">Above, I said that this problem is a lot closer to SSA construction than other<br class="">classic compiler problems. However, it's actually even weirder: instead of<br class="">having a set of uses and defs and computing the locations of PHIs: every<br class="">instruction with a DebugLoc in lexical scope is a use, and the PHIs are already<br class="">placed just not recorded. Thus the problems above are more about recovering<br class="">the locations of PHIs after they're eliminated.</div></div></blockquote><blockquote type="cite" class=""><div class=""><div class=""><br class="">This adds a lot of complexity to the lattice structure used for each variable<br class="">location compared to the current LiveDebugValues: there used to only ever be<br class="">one candidate location for each variable at every control flow merge, and over<br class="">the course of dataflow analysis this was found to be either true or false.<br class="">However because we're iteratively discovering new PHI locations as we explore,<br class="">the new implementation's lattice consists of:<br class=""> * A single candidate def (\top),<br class=""> * A PHI value at every control flow merge on the path between between the def<br class="">   and the current block,<br class=""> * A PHI value at the current block (\bot).<br class="">Each of which potentially needs one iteration of dataflow to see whether the<br class="">join should produce that value.<br class=""><br class="">This only truly needs exploring at loop heads, and the maximum number of<br class="">iterations required is the depth of loops in the function, which is the same as<br class="">the old LiveDebugValues implementation. However because all the PHI values<br class="">have to be discovered, that number of iterations is required more often than<br class="">the old implementation.<br class=""><br class="">I wrote more about the lattice in the docu-comment in the new implementation<br class="">[8]. I can produce a worked example too, but this email has gotten a bit<br class="">wordy already.<br class=""><br class=""># More ugly PHI stuff<br class=""><br class="">The section above holds true for the variable values analysis too, where<br class="">instead of defs we have variable assignments at DBG_VALUEs. Rather than<br class="">discovering where PHIs happen and propagating that information, variable<br class="">value analysis has to do the same to discover what machine-value PHI would be<br class="">needed to merge a variable location from all predecessors, and then examine<br class="">whether such a PHI is actually available, and has the right inputs. (This is<br class="">annoyingly fiddly).<br class=""><br class=""># Summary<br class=""><br class="">This new implementation goes fast enough, is accurate enough, and appears<br class="">to be sound. That isn't a reason to replace the current implementation; but<br class="">eventually the instruction reference work would make use of this, so if<br class="">possible I'd like to start review on it, if that's acceptable to people.<br class=""><br class="">Seeing how you've just read a lot of words, a motivating reminder: this path<br class="">can lead, slowly, to there being no debug instructions for variable locations!<br class="">And as described in my RFC sitrep-email, it can be more accurate than<br class="">location only tracking too.<br class=""><br class="">[0] <a href="http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html" class="">http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html</a><br class="">[1] Branch is here:<br class=""><a href="https://github.com/jmorse/llvm-project/commits/ldvars-separation" class="">https://github.com/jmorse/llvm-project/commits/ldvars-separation</a> , tip<br class="">is 11450def5bc592321ece2a64ea2a8cabbc614c39 at time of sending. I<br class="">could open a PR containing the changes, but github mucks them up; best<br class="">to just look at the bare file [8].<br class="">[2] <a href="http://lists.llvm.org/pipermail/llvm-dev/2020-February/139469.html" class="">http://lists.llvm.org/pipermail/llvm-dev/2020-February/139469.html</a><br class="">[3] <a href="https://github.com/llvm/llvm-project/blob/3626eba11f2367da88ccb0280a34731243278034/llvm/lib/CodeGen/LiveDebugValues.cpp#L9" class="">https://github.com/llvm/llvm-project/blob/3626eba11f2367da88ccb0280a34731243278034/llvm/lib/CodeGen/LiveDebugValues.cpp#L9</a><br class="">[4] <a href="https://bugs.llvm.org/show_bug.cgi?id=42834" class="">https://bugs.llvm.org/show_bug.cgi?id=42834</a><br class="">[5] <a href="https://github.com/jmorse/llvm-project/commit/720876fec2a5113d52977d00e0e1f7d526b4c1f0" class="">https://github.com/jmorse/llvm-project/commit/720876fec2a5113d52977d00e0e1f7d526b4c1f0</a><br class="">four commits from here<br class="">[6] <a href="https://reviews.llvm.org/D67500" class="">https://reviews.llvm.org/D67500</a> , I probably should push to land this<br class="">[7] It's not a bad view, and definitely something I've been preaching,<br class="">[8] <a href="https://github.com/jmorse/llvm-project/blob/11450def5bc592321ece2a64ea2a8cabbc614c39/llvm/lib/CodeGen/LiveDebugValues.cpp#L60" class="">https://github.com/jmorse/llvm-project/blob/11450def5bc592321ece2a64ea2a8cabbc614c39/llvm/lib/CodeGen/LiveDebugValues.cpp#L60</a><br class=""><br class="">--<br class="">Thanks for reading this essay,<br class="">Jeremy<br class=""></div></div></blockquote></div></div><div class=""><br class=""></div><div class="">vedant</div></body></html>