[LLVMdev] First-class debug info IR: MDLocation

Fri Oct 24 16:37:49 PDT 2014

On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

> I've attached a preliminary patch for `MDLocation` as a follow-up to the
> RFC [1] last week.  It's not commit-ready -- in particular, it squashes
> a bunch of commits together and doesn't pass `make check` -- but I think
> it's close enough to indicate the direction and work toward consensus.
>
> [1]: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/077715.html
>
> IMO, the files to focus on are:
>
>     include/llvm/IR/DebugInfo.h
>     include/llvm/IR/DebugLoc.h
>     include/llvm/IR/Metadata.h
>     include/llvm/IR/Value.h
>     lib/AsmParser/LLLexer.cpp
>     lib/AsmParser/LLParser.cpp
>     lib/AsmParser/LLParser.h
>     lib/AsmParser/LLToken.h
>     lib/Bitcode/Reader/BitcodeReader.cpp
>     lib/Bitcode/Writer/BitcodeWriter.cpp
>     lib/Bitcode/Writer/ValueEnumerator.cpp
>     lib/Bitcode/Writer/ValueEnumerator.h
>     lib/IR/AsmWriter.cpp
>     lib/IR/AsmWriter.h
>     lib/IR/DebugInfo.cpp
>     lib/IR/DebugLoc.cpp
>     lib/IR/LLVMContextImpl.cpp
>     lib/IR/LLVMContextImpl.h
>     lib/IR/Metadata.cpp
>
> Using `Value` instead of `MDNode`
> =================================
>
> A number of APIs expect `MDNode` -- previously, the only referenceable
> type of metadata -- but this patch (and the ones that will follow) have
> referenceable metadata that *do not* inherit from `MDNode`.  Metadata
> APIs such as `Instruction::getMetadata()` and
> `NamedMDNode::getOperand()` need to return non-`MDNode` metadata.
>
> I plan to commit the API changes incrementally so we can fix any issues
> there before pushing the functionality changes.  Unfortunately, this
> currently adds a lot of noise to the (squashed) patch.
>
> Introducing `MDLocation`
> ========================
>
> Of course, this adds `MDLocation`, the first subclass of `MDUser`.  This
> is a first-class IR type that has two other representations:
> `DILocation` (which now trivially wraps `MDLocation` instead of
> `MDNode`) and `DebugLoc`.
>
> I've genericised the code in `LLParser` (and elsewhere) to sketch out
> how adding other `MDUser` subclasses will go.  Perhaps I used the wrong
> axis, but we can adjust it as we go.
>
> Usage examples:
>
>     !6 = metadata MDLocation(line: 43, column: 7, scope: !4)
>     !7 = metadata MDLocation(scope: !5, line: 67, inlinedAt: !6)
>
> The fields can be listed in any order.  The `scope:` field is required,
> but the others are optional (`line:` and `column:` default to `0`,
> `inlinedAt:` defaults to `null`).
>
> (Note that in the RFC I referred to this as an `MDLineTable`, but
> `MDLocation` is a better name.  If/when this work supersedes the
> `DIDescriptor` hierarchy, it'll likely get renamed to `DILocation`, but
> for now there's a name clash.)
>
> Where this is heading
> =====================
>
> Let's look at a concrete example.  Here's some simple C++ code:
>
>     $ cat t.h
>     struct T { short a; short b; };
>     $ cat foo.cpp
>     #include "t.h"
>     int foo(T t) { return t.a + t.b; }
>     $ cat bar.cpp
>     #include "t.h"
>     int foo(T t);
>     int bar(T t) { return foo(t) * 2; }
>
> Looking forward, after refactoring ownership and uniquing and fixing up
> a few schema issues, I'd expect the above to link into something like
> the following:
>
>     !0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")
>     !1 = metadata DIFile(filename: "./t.h", directory: "/path/to")
>     !2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")
>     !3 = metadata DIBaseType(name: "short", size: 16, align: 16)
>     !5 = metadata DIBaseType(name: "int", size: 32, align: 32)
>     !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",
>                                   file: !1, line: 1, size: 32, align: 16)
>     !7 = metadata DIMember(line: 1, file: !1, type: !3,
>                            name: "a", size: 16, align: 16, context: !6)
>     !8 = metadata DIMember(line: 1, file: !1, type: !3,
>                            name: "b", size: 16, align: 16, context: !6)
>     !9 = metadata DISubroutineType(args: [ !5, !6 ])
>     !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0 ",
>                                  retainedUniqueTypes: [ !6 ])
>     !11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",
>                                 handle: i32(i32)* @_Z3foo1T, file: !0,
>                                 type: !9, context: !10)
>     !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6,
>                                  context: !11)
>     !13 = metadata DILocation(line: 2, column: 11, scope: !11)
>     !14 = metadata DILocation(line: 2, column: 16, scope: !11)
>     !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0 ",
>                                  retainedUniqueTypes: [ !6 ])
>     !16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",
>                                 handle: i32 (i32)* @_Z3bar1T, file: !2,
>                                 type: !9, context: !15)
>     !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6,
>                                  context: !16)
>     !18 = metadata DILocation(line: 3, column: 11, scope: !16)
>     !19 = metadata DILocation(line: 3, column: 23, scope: !16)
>
> Notice that only the links to parents (i.e., `context:`) are explicit
> here -- backlinks are implied.  For example, !7 and !8 point to !6, but
> not the reverse.
>

This may be a problem - the difference between nodes in a
structure/class_type's member list, and those that are not in the member
list but refer to the class/structure as their parent is meaningful. Type
units use this distinction to avoid emitting instantiation-specific data
into the canonical type unit (nested classes, implicit special members,
member template instantiations, etc).

Not sure what the right answer will be there.

>
> This has the interesting property of removing all cycles from
> serialization (assembly and bitcode).
>
> Making debug info assembly readable and writable
> ================================================
>
> Moreover, we're now in a place where it's trivial to express the
> "context" pointer structurally.  Here's the same debug info as above,
> using syntactic sugar to fill the "context" pointers:
>
>     !0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")
>     !1 = metadata DIFile(filename: "./t.h", directory: "/path/to")
>     !2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")
>     !3 = metadata DIBaseType(name: "short", size: 16, align: 16)
>     !5 = metadata DIBaseType(name: "int", size: 32, align: 32)
>     !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",
>                                   file: !1, line: 1, size: 32, align: 16) {
>       !7 = metadata DIMember(line: 1, file: !1, type: !3,
>                              name: "a", size: 16, align: 16)
>       !8 = metadata DIMember(line: 1, file: !1, type: !3,
>                              name: "b", size: 16, align: 16)
>     } ; !6
>     !9 = metadata DISubroutineType(args: [ !5, !6 ])
>     !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0 ",
>                                  retainedUniqueTypes: [ !6 ]) {
>       !11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",
>                                   handle: i32(i32)* @_Z3foo1T, file: !0,
>                                   type: !9) {
>         !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6)
>         !13 = metadata DILocation(line: 2, column: 11)
>         !14 = metadata DILocation(line: 2, column: 16)
>       } ; !11
>     } ; !10
>     !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0 ",
>                                  retainedUniqueTypes: [ !6 ]) {
>       !16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",
>                                   handle: i32 (i32)* @_Z3bar1T, file: !2,
>                                   type: !9) {
>         !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6)
>         !18 = metadata DILocation(line: 3, column: 11)
>         !19 = metadata DILocation(line: 3, column: 23)
>       } ; !16
>     } ; !15
>
> This assembly has the following advantages over the status quo:
>
>   - Fields are named.  Aside from readability, this prevents
>     adding/reordering fields in the schema from requiring testcase
>     updates.
>
>   - Serialization graph becomes a DAG.  Aside from readability, this
>     removes most RAUW from assembly (and all RAUW from bitcode).
>
>   - Structure is clear.
>
> Bike sheds to paint
> ===================
>
>  1. Should we trim some boilerplate?  E.g., it would be trivial to
>     change:
>
>         !6 = metadata MDLocation(line: 43, column: 7, scope: !4)
>
>     to:
>
>         !6 = MDLocation(line: 43, column: 7, scope: !4)
>
>     This would not complicate `LLParser`.  Thoughts?
>
>  2. Which of the two "end goal" syntaxes is better: flat, or
>     hierarchical?  Better for what?  Why?
>
>     The flat one might be better for FileCheck-ing (not sure), but IMO
>     the hierarchical one is much saner for us humans, and that's the
>     main point of assembly.  It wouldn't be hard to default to one and
>     write the other based on a command-line flag -- is that a good idea?
>

I don't think the flat form will be particularly more compelling for
testing. We test the hierarchical DWARF dumping all the time - doing so
pedantically does require littering CHECK-NOT: DW_TAG|NULL in a fair few
places, but even without those the tests aren't /too/ bad.

If checking hierarchical data is a problem, chances are we just want to
improve FileCheck until it isn't, since we already have hierarchies we have
to check.

>
>  3. Assembly syntax is pretty easy to change, so this doesn't have to be
>     perfect now.  Nevertheless, is there a magical syntax that would be
>     easier to read/write/FileCheck?
>

Presumably when dumping the fields will come in a specific, defined order?
(probably not preserving the original source order, etc) Variation there
would probably make checking harder than it needs to be.

Would it be possible to omit the names of unreferenced nested metadata? (if
you have a bunch of member functions in a class, but don't need to refer to
them elsewhere (eg: those member functions aren't defined in this
translation unit)) - that'd help readability/writeability, but probably
wouldn't impact FileCheck.

Also the whole "metadata " prefix on anything is a bit verbose, if we can
omit that in some/many places, that'll help reduce the visual noise/improve
readability/writeability.

But most/all of that I imagine is fairly easily incremental
change/improvement, nothing fundamental that needs to be chosen up-front.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/358bbd69/attachment.html>