[llvm-dev] [RFC] Moving llvm.dbg.value out of the instruction stream

Reid Kleckner via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 22 18:43:07 PDT 2018

One of the major drawbacks of our current variable tracking debug info
design is that it puts lots of llvm.dbg.value instructions in the
instruction stream. After the debug info BoF at the 2018 dev meeting (last
week), I started to seriously consider a design where we track variable
locations on the side of the instruction stream. People seemed to agree
this idea was good enough to at least explore, so I put together some
exploratory data and ideas.

Our current design tracks variable values with llvm.dbg.value instructions,
which live in the instruction stream. This is crummy, because passes have
to know to filter them. Today we have filtering iterators, but most passes
have not been audited to use them, and we regularly have bugs where
enabling debug info changes what the optimizer does. If we move variable
tracking out of the instruction stream, we can fix this entire class of
bugs by design.

Second, the current design slows down basic block iteration. My intuition
says that, after inlining, we have *tons* of dbg.value instructions
clogging up our basic blocks, slowing down our standard optimizer pipeline.
Linked list iteration isn't fast on modern architectures, so if dbg.values
are an appreciable source of elements, moving it out of the list will speed
up compilation and make -g less of a compile-time bear.

One thing this change will *not* address is the kind of use-before-def bugs
we get today where dbg info unaware passes sink or hoist SSA instructions
across the dbg.value that uses them. It has been suggested that we should
attach the dbg.value directly to the instruction producing the value, but I
think this is a mistake, because it only handles the common case, and not
the special case where the value is produced far from the point of the
assignment. Even if we have the special case attachment for value-producing
instructions, we need new APIs to help passes decide how to update the
debug info when sinking or hoisting across blocks.

Data & Motivation

I used my compile-time test case of the day for this: SemaChecking.cpp. I
re-compiled opt with the attached patch, which gets the total number of
Instructions after CGP, and the number of dbg intrinsics. I compiled
SemaChecking.cpp to bitcode with `clang -cc1 -debug-info-kind=limited
-gcodeview -mllvm -instcombine-lower-dbg-declare=0` to match what Chromium
uses. I think these numbers are general, though. Here are the numbers I got
from my patch:

160402 codegenprepare - Total number of llvm.dbg.* instructions
296080 codegenprepare - Total number of instructions

So, 54% of all instructions are dbg.(value|declare) instructions today!
Moving them out of the main BB ilist would speed up BB iteration by 2x.
That's not necessarily where we spend our time. I happen to already know
that this test case does a lot of BB scanning (PR38829), so I know it will
help this test case, but it may not matter if we make local dominance not

In any case, here is the time to compile to bitcode for this example:
-O2 -g: 89.6s
-O2: 53.6s
-O2 -gmlt: 52.6s

-gmlt should be slower than no debug info, so clearly my system is noisy,
but I think the impact of dbg.value is still clearly observable.

The next thing I noticed is that dbg.values tend to appear in packs. My
theory is that most of this is from multiple levels of inlining of helpers
that forward parameters. Here is a histogram of the length of dbg intrinsic

dbg instructions, run length | freq
1 13086
2 6587
3 3927
4 1605
5 1643
6 1414
7 2165
8 1708
9 226
10 36
11 26
12 14
13 1750
14 70
15 224
16 3
17 154
18 5
19 1
20 2
22 209
23 688
24 705
28 4
29 1
36 1
40 2
71 1

My understanding of the histogram above is that 52% of dbg.values are in
runs of length 8 or greater. You can see some spikes here at 13 and 22-24,
which I think supports my theory about inlining above.

This example is just a standard -O2 large TU in clang. I can only imagine
that the dbg.value runs get longer in ThinLTO, where the inliner runs more.


dbg.value is meant to associate an SSA value with a label. My basic idea is
that it's better to attach a label to the instruction it precedes rather
than creating a separate instruction. However, a label should stay behind
if the instruction it precedes moves. For example,
Instruction::moveAfter/moveBefore should implicitly reattach their variable
tracking info to the next instruction. While working on
https://reviews.llvm.org/D51664, I realized it would be pretty easy to use
the ilist callbacks currently used for symbol table updating to implement

I think making the attachments describe the values of variables before the
instruction is good, because in the general case every block has at least
one instruction, which typically doesn't produce a value: the terminator.
The only terminator that produces a value is invoke, and we can attach any
variable tracking info for it to the first (non-phi?) instruction in the
normal successor.

I think, overall, this is just an internal representation shift. We may
want to change our bitcode and assembly language to better align with our
internal representation, but it should be functionaly equivalent.

IR syntax

This is kind of important, since it affects how developers think about
these and keep them up to date. Consider the following C source:

  int foo(void);
  int bar(void) {
    int v = foo();
    v = foo();
    return v;

I'm proposing moving from something like this:

  %v1 = call i32 @foo()
  call void @llvm.dbg.value(metadata i32 %v1, metadata !123, metadata
!DIExpression) !dbg !456
  %v2 = call i32 @foo()
  call void @llvm.dbg.value(metadata i32 %v2, metadata !123, metadata
!DIExpression) !dbg !456
  ret i32 %v2

To something like this:

  %v1 = call i32 @foo()
  !dbgvalue i32 %v1, variable !123, loc !456 [, expr ...]
  %v2 = call i32 @foo()
  !dbgvalue i32 %v2, variable !123, loc !456 [, expr ...]
  ret i32 %v2

'expr' would be an optional DIExpression, expressed more compactly if
possible, perhaps using our inline !DIExpression parsing for now.

This also avoids the confusion that today the dbgloc attached to a
dbg.value is not actually used to generate line table entries, it's only
for tracking distinct variables created from different inlined call sites.

Data structures

What we're really trying to represent is two separate sequences. For
simplicity and minimal change, I initially propose that we create a
secondary linked list of some new DbgValue data structures. They would
relate kind of like this:

  inst1 -> inst2 -> inst3 -> inst4
    \        \        \        \
    dv1 ->   dv2 ->   dv3 ->   dv4

If we were to delete inst2, dv2 would be reattached to the following

  inst1 -> inst3 ->          inst4
    \        \                 \
    dv1 ->   dv2 ->   dv3 ->   dv4

What's interesting is that once dv2 and dv3 are grouped together, it's
actually a bug if we ever generate code between them. I can imagine more
compact representations than a linked list, but we want to be able to
efficiently join two sequences of dbgvalues when deleting or moving a
single instruction, and linked lists achieve that. The list would be owned
by the BasicBlock, since if a whole block is unreachable, there's no where
to put the value tracking info.

Actually doing it

LLVM currently has several unfinished migrations going on and a fair amount
of technical debt. It's not clear that this project is the top priority for
me or anyone else at this moment, so even if people like this idea, don't
assume it will be implemented soon. I think similar efficiency reasoning
applies to the pointee-type removal API migration that David Blaikie
started. Casts are very similar to dbg.values in this way.

However, I wanted to write it all down before letting it fade from memory.
Apologies for the length, I didn't have time to make a shorter RFC. =P
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181022/71119f60/attachment.html>

More information about the llvm-dev mailing list