[PATCH] D138869: [Docs][RFC] Add AMDGPU LLVM Extensions for Heterogeneous Debugging

Mon Dec 5 15:28:09 PST 2022

scott.linder added a comment.

In D138869#3966334 <https://reviews.llvm.org/D138869#3966334>, @jmorse wrote:

> Hi Scott,
>
> I really enjoyed the conference talk about this, and moving the issues of how variables are fragmented into smaller chunks to higher up in the metadata hierachy makes a lot of sense. It could substantially simplify + improve our tracking of variables today,
>
> There's a lot of different things being re-engineered in this proposal, and I'd like to make sure that I have the correct understanding of how the current variable location design maps to this new one. As I understand it, dbg.values become dbg.def and dbg.kill intrinsics, connecting IR Values to DILifeime objects. The DILifetimes refer to a hierarchy of DIFragments that specify what's being defined (which is great), and the expression required to produce the variable value / location from the inputs.

Yes, that all matches up with the proposal!

> After that it becomes fuzzier though: it's not obvious to me how the current variable value / location is determined when there can be (according to the document) multiple disjoint and overlapping lifetimes that are active. If a variable fragment has different runtime values, which one should we pick, and how -- or if they're supposed to always have the same value, what guarantees this during optimisation? Right now, dbg.value intrinsics are effectively an assignment to the variable [fragment], and the variable value is the last dominating dbg.value assignment (or possibly a PHI between multiple of them, determined by LiveDebugValues). What is the equivalent for these new intrinsics?

There are actually multiple locations at runtime; as you say, the compiler must guarantee they contain the same value at runtime, so the debug info consumer can read from any location. However, the debug info consumer must write to each location.

Instead of intrinsics acting as assignments to a mutable, singleton "variable location" they instead each act independently and must refer to a distinct "lifetime" (DILifetime). If in the old world there are 4 calls to dbg.value for a single variable, the new version would instead create 4 DILifetimes and replace each dbg.value with a pair of non-overlapping dbg.def+dbg.kill

> I think lifetimes and def/kill relationships makes sense after register allocation where that's the form the program is in, but it's not clear how it would work in SSA form. It's also worth noting that the multiple-locations-after-regalloc problem is solved, to a large extent, by the instruction referencing rewrite [0], essentially keeping the debugging information in SSA form and then recognising target COPY and value-movements to track the multiple locations a value can be resident.

Even in SSA an `llvm::Value` may only coincide with the source variable for part of its existence. In the old model this is represented by having multiple calls to dbg.value which refer to the same source variable. What about the def/kill representation seems like it won't work in SSA form?

I'm not sure I understand the relationship between the instr-ref work and multiple-locations; it seems to me that it still only leaves us with one machine location per variable at any given position in the program, or am I not understanding something?

> There's value in having multiple ways of expressing variable locations, during loop-strength-reduction you can recompute a variable from the loop starting values or from the strength-reduced variables, for example. It needs to be approached with some delicacy though to save memory.
>
> At a more abstract level, I've a worry that this might move us more in the direction of requiring more knowledge / maintenence during optimisation passes to preserve debug-info invariants, where it seems more beneficial to reduce that kind of maintenence, in compile and engineering time. It's certainly the motivation behind the assignment tracking work [1], which is inferring information about optimisations from what gets deleted rather than what gets preserved.

I need to spend a bit more time going through the assignment-tracking work to form a better response, but the principle of reducing the work required in the vast majority of passes while maintaining meaningful and accurate debug information sounds great!

> [0] https://www.youtube.com/watch?v=yxuwfNnp064
> [1] https://discourse.llvm.org/t/rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir/62367

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138869/new/

https://reviews.llvm.org/D138869