[llvm-dev] [RFC] Moving llvm.dbg.value out of the instruction stream

Tue Oct 23 09:40:33 PDT 2018

Thanks for writing this up! I think this is definitely worth exploring.

> On Oct 22, 2018, at 6:43 PM, Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
<...>

> Design
> -------
> 
> dbg.value is meant to associate an SSA value with a label. My basic idea is that it's better to attach a label to the instruction it precedes rather than creating a separate instruction. However, a label should stay behind if the instruction it precedes moves. For example, Instruction::moveAfter/moveBefore should implicitly reattach their variable tracking info to the next instruction. While working on https://reviews.llvm.org/D51664, I realized it would be pretty easy to use the ilist callbacks currently used for symbol table updating to implement this.
> 
> I think making the attachments describe the values of variables before the instruction is good, because in the general case every block has at least one instruction, which typically doesn't produce a value: the terminator. The only terminator that produces a value is invoke, and we can attach any variable tracking info for it to the first (non-phi?) instruction in the normal successor.
> 
> I think, overall, this is just an internal representation shift. We may want to change our bitcode and assembly language to better align with our internal representation, but it should be functionaly equivalent.
> 
> 
> IR syntax
> ----------
> 
> This is kind of important, since it affects how developers think about these and keep them up to date. Consider the following C source:
> 
>   int foo(void);
>   int bar(void) {
>     int v = foo();
>     v = foo();
>     return v;
>   }
> 
> I'm proposing moving from something like this:
> 
>   %v1 = call i32 @foo()
>   call void @llvm.dbg.value(metadata i32 %v1, metadata !123, metadata !DIExpression) !dbg !456
>   %v2 = call i32 @foo()
>   call void @llvm.dbg.value(metadata i32 %v2, metadata !123, metadata !DIExpression) !dbg !456
>   ret i32 %v2
> 
> To something like this:
> 
>   %v1 = call i32 @foo()
>   !dbgvalue i32 %v1, variable !123, loc !456 [, expr ...]
>   %v2 = call i32 @foo()
>   !dbgvalue i32 %v2, variable !123, loc !456 [, expr ...]
>   ret i32 %v2

At first glance, this example seems to contradict the definition in the Design section. If a !dbgvalue describes the debug info *before* the instruction, I would have expected the syntax to be like

  %v1 = call i32 @foo(), !dbg !1
  %v2 = call i32 @foo(), !dbg !2, !dbgvalue i32 %v1, variable !123, loc !456 [, expr ...]

So my question is: which instruction is the !dbgvalue attached to, and should we make this more explicit in the syntax?

> 
> 'expr' would be an optional DIExpression, expressed more compactly if possible, perhaps using our inline !DIExpression parsing for now.
> 
> This also avoids the confusion that today the dbgloc attached to a dbg.value is not actually used to generate line table entries, it's only for tracking distinct variables created from different inlined call sites.

We could go one step further and only specify an optional inlinedAt field instead of a full location.

Alternatively, we could start paying attention to the !dbg location of dbg.values. When instructions are moved around by transformations, it is often very dbg.values often ambiguous where a dbg.value belongs. If we tied a dbg.value should more closely to a specific DILocation, some of that ambiguity might go away, but I couldn't come up with a consistent model that uses this approach so far.

One last thing to consider for both syntax and internal representation: it would be nice if the model naturally allowed for a future extension (https://bugs.llvm.org/show_bug.cgi?id=39141) where one dbg.value's DIExpression can refer to more than one SSA value.

-- adrian

> 
> 
> Data structures
> ----------------
> 
> What we're really trying to represent is two separate sequences. For simplicity and minimal change, I initially propose that we create a secondary linked list of some new DbgValue data structures. They would relate kind of like this:
> 
>   inst1 -> inst2 -> inst3 -> inst4
>     \        \        \        \
>     dv1 ->   dv2 ->   dv3 ->   dv4
> 
> If we were to delete inst2, dv2 would be reattached to the following instruction:
> 
>   inst1 -> inst3 ->          inst4
>     \        \                 \
>     dv1 ->   dv2 ->   dv3 ->   dv4
> 
> What's interesting is that once dv2 and dv3 are grouped together, it's actually a bug if we ever generate code between them. I can imagine more compact representations than a linked list, but we want to be able to efficiently join two sequences of dbgvalues when deleting or moving a single instruction, and linked lists achieve that. The list would be owned by the BasicBlock, since if a whole block is unreachable, there's no where to put the value tracking info.
> 
> 
> Actually doing it
> -----------------
> 
> LLVM currently has several unfinished migrations going on and a fair amount of technical debt. It's not clear that this project is the top priority for me or anyone else at this moment, so even if people like this idea, don't assume it will be implemented soon. I think similar efficiency reasoning applies to the pointee-type removal API migration that David Blaikie started. Casts are very similar to dbg.values in this way.
> 
> However, I wanted to write it all down before letting it fade from memory. Apologies for the length, I didn't have time to make a shorter RFC. =P
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev