[llvm-dev] BoF: Debug info for optimized code.

Robinson, Paul via llvm-dev llvm-dev at lists.llvm.org
Thu Nov 10 14:07:06 PST 2016


At the BoF session, Reid Kleckner wrote a few notes on the whiteboard
and then I got a photo of it before the next session started up.  I've
transcribed those notes here, and expanded on them a bit with my own
thoughts.  If anybody else has notes/thoughts, please share them.

Whiteboard notes
----------------
Variable info metrics
- Induction variable tracking
- Contrast -O0 vs -O2 variables, breakpoint locations
- Track line info for side effects only
  (semantic stepping) "key" instructions


Unpacking that a bit...

Induction variable tracking
---------------------------
Somebody (Hal?) observed that in counted loops (I = 1 to N) the counter
often gets transformed into something else more useful (e.g. an offset 
instead of an index).  DWARF is powerful enough to express how to recover 
the original counter value, if only the induction transformation had a way
to describe what it did (or more precisely, how to recover the original
value after what it did).


Contrast -O0 vs -O2 variables, breakpoint locations
---------------------------------------------------
This came up during a discussion on debug-info-quality testing/metrics.
One metric for quality of debug info of optimized code is to compare what 
is "available" at -O0 to what what is "available" at -O2.  This can be
applied to both kinds of debug info affected by optimizations: whether a
variable is available (has a defined location) and whether a breakpoint
is available (the line has a defined "is-a-statement" address).

If you look at the set of instructions where a variable has a valid 
location, how does that set compare to the set of instructions for the 
lexical scope that contains the variable?  If you look at the sets of 
breakpoint locations described by the line table, how does the set for 
-O2 compare to the set for -O0?

It's not hard to imagine tooling that would permit comparisons of this
kind, and some people have had tooling like that in previous jobs.


Track line info for side effects only
(aka semantic stepping or "key" instructions)
---------------------------------------------
This idea is based on two observations:
(1) Optimization tends to shuffle instructions around, so that you end
    up with instructions "from" a given source line being mixed in with
    instructions "from" other source lines.  If we very precisely track
    the source line for every instruction, then single-stepping through
    "the source" in a debugger becomes very back-and-forth and choosing
    a good place to set a breakpoint on "the line" becomes a dicey
    proposition.
(2) If you look at the set of instructions generated for a given line,
    it's easy to conclude that "some are more equal than others."  This
    means for something like a simple assignment, the load is kind of
    important, the ZEXT not so much, and the store is really the thing.
So, picking and choosing which instructions to mark as good stopping
places could well improve the user-experience without significantly
interfering with the user's ability to see what their program is doing.

[Okay, I'm really going beyond what we said in the BoF, but I think it's
a worthwhile point to expand upon.]

Let's unpack an assignment from an 'unsigned short' to an 'unsigned long'
as an example.  This basically turns into a load/ZEXT/store sequence.

If you have an optimization that hoists the load+ZEXT above an 'if' or
loop-top, but leaves the store down inside the 'then' part or loop body,
is it really important to tag the load+ZEXT with the original source
line?  If you want to stop on "the line," doing it just before the store
is really the critical thing.

That is, the store is the "key" or "semantically significant" instruction
here, and the load/ZEXT are not so important.  You can have a smooth,
user-friendly debugging experience if you mark the store as a good
stopping point for that statement, and don't mark the load/ZEXT that way
(even though, pedantically, the load/ZEXT are also "from" the same source
statement).

Now, how far you take this idea and in what circumstances is arguable
because it very quickly is in the arena of human-factors quality, and
people may differ in their preferences for "precise" versus "smooth"
single-stepping or breakpoint-location experience.  But these things
definitely have an effect on the experience and we have to be willing
to trade off one for the other in some cases.

Thanks,
--paulr



More information about the llvm-dev mailing list