[cfe-dev] RFC: Remove uninteresting debug locations at -O0

Fri May 22 16:13:56 PDT 2020

If you intend to manage source locations for an optimizing compiler, you
need a much richer data structure than Clang & LLVM currently support. The
compiler company I worked for went down this road ~25 years ago. It's
actually quite straightforward to build, albeit potentially expensive in
terms of memory and compilation time. But the result is that you can
support accurate debugging of fully-optimized code. The model we used
didn't even try to represent any source location smaller than a line of
code, because we had waaay less memory available back then. The approach we
used can reasonably apply to any granularity of source location you choose
to implement. That said:

   - Any given instruction may "originate from" an arbitrary number of
   distinct source locations (call them LOCs). Consider inline expansion of a
   method, or macro expansion as two obvious examples; there are many more.
   The resulting instructions "originate from" the location of the call (or
   macro) *and also* from the particular line(s) in the method (macro) that
   generated them. So we need to be able to map one
   instruction/program-address to many LOCs.
   - Any given line of code may be implemented by a bunch of instructions
   that may or may not be contiguous. So we need to be able to map a single
   LOC to multiple instructions, ranges of instructions, multiple
   non-contiguous ranges of instructions, etc.
   - Depending on later optimization, there could legitimately be multiple
   distinct instructions *each of which* is "a beginning" of that LOC. More
   multiplicity.
   - Depending on later optimization, there could be multiple distinct
   *instances* of a single LOC. Consider a method that is inline-expanded
   at K call sites.  Each LOC within that method has (at least) K instances.
   Each maps back to the same LOC, but the instances aren't the same as each
   other. This is an important and possibly non-obvious part of the design.
   Loop unrolling is another example; hoisting an invariant out of a loop is
   yet another.

We found by experience that attempting to maintain beginnings (and endings)
of LOC instances through the entire compiler was too hard; we always messed
it up. So what we did instead was to tag each syntax tree (or Optimizer IR
node, or instruction, etc.) with *all* the LOC instances that it's part
of.  We did some quite simple LOC management (see below) throughout IR
generation, optimization, and code generation, then computed the
beginnings, ends, and ranges of LOC instances at the end of the compilation
process via the obvious flow-based algorithms.

Tagging everything with all its LOC info and recovering beginnings/ends
after all transformations are complete makes managing source location data
inside the compiler essentially trivial. It devolves into five simple cases:

   1. When you delete code/IR/instructions (I'll call all of these *code* from
   now on), delete the LOC data along with them. With a reasonable
   implementation this comes for free when deleting the code.
   2. When moving code, move the LOC data along with it/them. Also a
   freebie.
   3. When combining code, union the LOC info. This sometimes happens in
   combination with the next case.
   4. When creating new code (simple case) copy the LOC info from the
   origin from which it was created onto the new code.
   5. When replicating code (more complex case) — inline expansion, loop
   unrolling, macro expansion... — create a new *instance* of each LOC for
   each replication of the origin. New instances are treated as distinct lines
   of code for the purpose of computing begin/end locations. When setting a
   breakpoint at the beginning of line N, for example, you would get
   breakpoint(s) at each beginning of each distinct instance of line N. For
   fancier debug support, one could set breakpoints at particular instances of
   line N without stopping at other instances, and so on.

The pain of implementing such an approach in the compiler is that somebody
has to visit more or less the entire code base to ensure that each thing
that transforms code applies the correct choice from the above 5 cases. The
good news is that each such decision is both local and essentially trivial
— we finished the first cut at the update in our compilers' code (~400KLOC
at the time) in about two person-weeks of effort. Followed, of course, by
testing & debugging time to smoke out the incorrect or missing decisions.

The result of this approach is that you get quality source location
information for minimum-to-low optimization levels at reasonable expense in
memory & compile-time and with only modest expansion of the size of the
debug info. At higher optimization levels you continue to get
fully-accurate source location information (modulo bugs, as with all
things). Depending on how agressive your optimizations are, that location
information may be anything from somewhat non-obvious to utterly
brain-bending. The increase in size for the debug info can be large, but is
roughly proportional to the degree to which code has been transformed.

You can propagate LOCs through essentially arbitrary optimizations: loop
unrolling, vectorization, loop pipelining, cache blocking, or whatever. You
can even track source locations through software loop pipelined, unrolled &
optimized VLIW code, if that happens to be the problem at hand. As an added
benefit, it supports things like providing auditors looking at
safety-critical code with the exact origin(s) of every instruction in the
program. It even helps compiler implementers understand what their
optimizer has done to the program.

Obviously, your debugger has to understand the more complicated model of
source locations. For example, asking for a breakpoint at the beginning of
a source line may require setting either one or many breakpoints in the
program.  Similarly, if you stop at an arbitrary instruction and ask the
debugger where you are in the source, you could wind up with something like
"Stopped at the beginning of line-K-instance-X, somewhere during execution
of line-L-instance-1, and just after the end of line-A". The most useful
breakpoint operations were: "stop at the beginning of (each instance of) a
line" and "stop just after the last instruction of (each instance of) a
line", because those are the points where either nothing for that line has
yet executed or where everything for that line is complete. Obviously,
arbitrary other stuff could be in flight there as well, including prior or
subsequent lines, or whatever.

We also implemented a way to accurately explain to the debugger how to find
the value of variables post optimization, but that'd need another big wall
of text so I'll skip it for now.

Sorry to go on so long. My primary message is that propagating rich source
location information through a compiler is a problem that is conceptually
easy, is tedious but feasible to implement, and that has known proven
solutions. Further details on request, if anyone is interested.

Dean

On Wed, Apr 29, 2020 at 2:59 PM David Blaikie via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

>
>
> On Wed, Apr 29, 2020 at 2:52 PM Reid Kleckner <rnk at google.com> wrote:
>
>> Sure, I can try to elaborate, but I am assuming a bit about what the
>> users desired stepping behavior is. I am assuming in this case that the
>> user is doing the equivalent of `s` or `n` in gdb, and they want to get
>> from one semicolon to the next. If that is not the case, if the user wants
>> the debugger to stop at the location of both getFoo() calls in your
>> example, my suggestion isn't helpful.
>>
>> However, since the last dev meeting, I have been thinking about having IR
>> intrinsics that mark the IR position where a statement begins. The
>> intrinsic would carry a source location. The next instruction emitted with
>> that source location would carry the DWARF is_stmt line table flag.
>>
>> Making this idea work with optimizations is harder, since the
>> instructions that make up a statement may all be removed or hoisted. To
>> make this work, we would have to establish which instructions properly
>> belong to the statement. This could be done by adding a level to the
>> DIScope hierarchy. The statement markers would remain in place, and code
>> motion would happen around them.
>>
>
> In this case there would be no intrinsic, just a new kind of "statement"
> DIScope? yeah, that's the best idea I've had/heard of/discussed for
> statement grouping so far. Certainly seems imminently prototype-able & then
> get a sense for just how expensive scoping everything like that would be.
>
>
>> Late in the codegen pipeline, the first instruction belonging to the most
>> recently activated statement will be emitted with the is_stmt flag.
>>
>> On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <aprantl at apple.com> wrote:
>>
>>> Thanks to all of you for sharing your perspective! You all brought up
>>> important aspects that I hadn't considered. The argument that really
>>> clicked for me was Pavel's that you might want to change the value before
>>> the load happens.
>>>
>>> One of my worries here is that what I called "less interesting"
>>> locations might delete more interesting ones when instructions and their
>>> locations are merged. But if we indeed consider the stack slot loads to be
>>> interesting then that is really the best we can do. After all, storing more
>>> than source location per instruction would be a UI design nightmare for a
>>> debugger.
>>>
>>
> Actually I'd kind of love this - I think it'd be really great to describe
> source ranges instead of source locations - probably super size costly
> though (& intermediate compiler memory usage, etc). Being able to describe
> nesting, etc - the same way Clang does with expression/subexpression
> highlighting I think would be wonderful - imagine if every instruction
> weren't just attributed to one location, but to a source range with a
> preferred location (instead of just pointing to the "+" for an add, be able
> to highlight the LHS and RHS of that plus expression, etc).
>
>
>>
>>>
>>> Reid, I would like to learn more about what you mean by statement
>>> tracking. I'm thinking of a statements as the expressions separated by
>>> semicolons, but since my whole example was a single statement I'm assuming
>>> you have something more fine-grained in mind. Perhaps you can post an
>>> example to illustrate what you have in mind?
>>>
>>>
>>> thanks!
>>> -- adrian
>>
>> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200522/86cfad4a/attachment-0001.html>