<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I've attached a preliminary patch for `MDLocation` as a follow-up to the<br>

RFC [1] last week.  It's not commit-ready -- in particular, it squashes<br>

a bunch of commits together and doesn't pass `make check` -- but I think<br>

it's close enough to indicate the direction and work toward consensus.<br>

<br>

[1]: <a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/077715.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/077715.html</a><br>

<br>

IMO, the files to focus on are:<br>

<br>

    include/llvm/IR/DebugInfo.h<br>

    include/llvm/IR/DebugLoc.h<br>

    include/llvm/IR/Metadata.h<br>

    include/llvm/IR/Value.h<br>

    lib/AsmParser/LLLexer.cpp<br>

    lib/AsmParser/LLParser.cpp<br>

    lib/AsmParser/LLParser.h<br>

    lib/AsmParser/LLToken.h<br>

    lib/Bitcode/Reader/BitcodeReader.cpp<br>

    lib/Bitcode/Writer/BitcodeWriter.cpp<br>

    lib/Bitcode/Writer/ValueEnumerator.cpp<br>

    lib/Bitcode/Writer/ValueEnumerator.h<br>

    lib/IR/AsmWriter.cpp<br>

    lib/IR/AsmWriter.h<br>

    lib/IR/DebugInfo.cpp<br>

    lib/IR/DebugLoc.cpp<br>

    lib/IR/LLVMContextImpl.cpp<br>

    lib/IR/LLVMContextImpl.h<br>

    lib/IR/Metadata.cpp<br>

<br>

Using `Value` instead of `MDNode`<br>

=================================<br>

<br>

A number of APIs expect `MDNode` -- previously, the only referenceable<br>

type of metadata -- but this patch (and the ones that will follow) have<br>

referenceable metadata that *do not* inherit from `MDNode`.  Metadata<br>

APIs such as `Instruction::getMetadata()` and<br>

`NamedMDNode::getOperand()` need to return non-`MDNode` metadata.<br>

<br>

I plan to commit the API changes incrementally so we can fix any issues<br>

there before pushing the functionality changes.  Unfortunately, this<br>

currently adds a lot of noise to the (squashed) patch.<br>

<br>

Introducing `MDLocation`<br>

========================<br>

<br>

Of course, this adds `MDLocation`, the first subclass of `MDUser`.  This<br>

is a first-class IR type that has two other representations:<br>

`DILocation` (which now trivially wraps `MDLocation` instead of<br>

`MDNode`) and `DebugLoc`.<br>

<br>

I've genericised the code in `LLParser` (and elsewhere) to sketch out<br>

how adding other `MDUser` subclasses will go.  Perhaps I used the wrong<br>

axis, but we can adjust it as we go.<br>

<br>

Usage examples:<br>

<br>

    !6 = metadata MDLocation(line: 43, column: 7, scope: !4)<br>

    !7 = metadata MDLocation(scope: !5, line: 67, inlinedAt: !6)<br>

<br>

The fields can be listed in any order.  The `scope:` field is required,<br>

but the others are optional (`line:` and `column:` default to `0`,<br>

`inlinedAt:` defaults to `null`).<br>

<br>

(Note that in the RFC I referred to this as an `MDLineTable`, but<br>

`MDLocation` is a better name.  If/when this work supersedes the<br>

`DIDescriptor` hierarchy, it'll likely get renamed to `DILocation`, but<br>

for now there's a name clash.)<br>

<br>

Where this is heading<br>

=====================<br>

<br>

Let's look at a concrete example.  Here's some simple C++ code:<br>

<br>

    $ cat t.h<br>

    struct T { short a; short b; };<br>

    $ cat foo.cpp<br>

    #include "t.h"<br>

    int foo(T t) { return t.a + t.b; }<br>

    $ cat bar.cpp<br>

    #include "t.h"<br>

    int foo(T t);<br>

    int bar(T t) { return foo(t) * 2; }<br>

<br>

Looking forward, after refactoring ownership and uniquing and fixing up<br>

a few schema issues, I'd expect the above to link into something like<br>

the following:<br>

<br>

    !0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")<br>

    !1 = metadata DIFile(filename: "./t.h", directory: "/path/to")<br>

    !2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")<br>

    !3 = metadata DIBaseType(name: "short", size: 16, align: 16)<br>

    !5 = metadata DIBaseType(name: "int", size: 32, align: 32)<br>

    !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",<br>

                                  file: !1, line: 1, size: 32, align: 16)<br>

    !7 = metadata DIMember(line: 1, file: !1, type: !3,<br>

                           name: "a", size: 16, align: 16, context: !6)<br>

    !8 = metadata DIMember(line: 1, file: !1, type: !3,<br>

                           name: "b", size: 16, align: 16, context: !6)<br>

    !9 = metadata DISubroutineType(args: [ !5, !6 ])<br>

    !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,<br>

                                 producer: "clang version 3.6.0 ",<br>

                                 retainedUniqueTypes: [ !6 ])<br>

    !11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",<br>

                                handle: i32(i32)* @_Z3foo1T, file: !0,<br>

                                type: !9, context: !10)<br>

    !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6,<br>

                                 context: !11)<br>

    !13 = metadata DILocation(line: 2, column: 11, scope: !11)<br>

    !14 = metadata DILocation(line: 2, column: 16, scope: !11)<br>

    !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,<br>

                                 producer: "clang version 3.6.0 ",<br>

                                 retainedUniqueTypes: [ !6 ])<br>

    !16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",<br>

                                handle: i32 (i32)* @_Z3bar1T, file: !2,<br>

                                type: !9, context: !15)<br>

    !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6,<br>

                                 context: !16)<br>

    !18 = metadata DILocation(line: 3, column: 11, scope: !16)<br>

    !19 = metadata DILocation(line: 3, column: 23, scope: !16)<br>

<br>

Notice that only the links to parents (i.e., `context:`) are explicit<br>

here -- backlinks are implied.  For example, !7 and !8 point to !6, but<br>

not the reverse.<br></blockquote><div><br>This may be a problem - the difference between nodes in a structure/class_type's member list, and those that are not in the member list but refer to the class/structure as their parent is meaningful. Type units use this distinction to avoid emitting instantiation-specific data into the canonical type unit (nested classes, implicit special members, member template instantiations, etc).<br><br>Not sure what the right answer will be there.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

This has the interesting property of removing all cycles from<br>

serialization (assembly and bitcode).<br>

<br>

Making debug info assembly readable and writable<br>

================================================<br>

<br>

Moreover, we're now in a place where it's trivial to express the<br>

"context" pointer structurally.  Here's the same debug info as above,<br>

using syntactic sugar to fill the "context" pointers:<br>

<br>

    !0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")<br>

    !1 = metadata DIFile(filename: "./t.h", directory: "/path/to")<br>

    !2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")<br>

    !3 = metadata DIBaseType(name: "short", size: 16, align: 16)<br>

    !5 = metadata DIBaseType(name: "int", size: 32, align: 32)<br>

    !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",<br>

                                  file: !1, line: 1, size: 32, align: 16) {<br>

      !7 = metadata DIMember(line: 1, file: !1, type: !3,<br>

                             name: "a", size: 16, align: 16)<br>

      !8 = metadata DIMember(line: 1, file: !1, type: !3,<br>

                             name: "b", size: 16, align: 16)<br>

    } ; !6<br>

    !9 = metadata DISubroutineType(args: [ !5, !6 ])<br>

    !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,<br>

                                 producer: "clang version 3.6.0 ",<br>

                                 retainedUniqueTypes: [ !6 ]) {<br>

      !11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",<br>

                                  handle: i32(i32)* @_Z3foo1T, file: !0,<br>

                                  type: !9) {<br>

        !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6)<br>

        !13 = metadata DILocation(line: 2, column: 11)<br>

        !14 = metadata DILocation(line: 2, column: 16)<br>

      } ; !11<br>

    } ; !10<br>

    !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,<br>

                                 producer: "clang version 3.6.0 ",<br>

                                 retainedUniqueTypes: [ !6 ]) {<br>

      !16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",<br>

                                  handle: i32 (i32)* @_Z3bar1T, file: !2,<br>

                                  type: !9) {<br>

        !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6)<br>

        !18 = metadata DILocation(line: 3, column: 11)<br>

        !19 = metadata DILocation(line: 3, column: 23)<br>

      } ; !16<br>

    } ; !15<br>

<br>

This assembly has the following advantages over the status quo:<br>

<br>

  - Fields are named.  Aside from readability, this prevents<br>

    adding/reordering fields in the schema from requiring testcase<br>

    updates.<br>

<br>

  - Serialization graph becomes a DAG.  Aside from readability, this<br>

    removes most RAUW from assembly (and all RAUW from bitcode).<br>

<br>

  - Structure is clear.<br>

<br>

Bike sheds to paint<br>

===================<br>

<br>

 1. Should we trim some boilerplate?  E.g., it would be trivial to<br>

    change:<br>

<br>

        !6 = metadata MDLocation(line: 43, column: 7, scope: !4)<br>

<br>

    to:<br>

<br>

        !6 = MDLocation(line: 43, column: 7, scope: !4)<br>

<br>

    This would not complicate `LLParser`.  Thoughts?<br>

<br>

 2. Which of the two "end goal" syntaxes is better: flat, or<br>

    hierarchical?  Better for what?  Why?<br>

<br>

    The flat one might be better for FileCheck-ing (not sure), but IMO<br>

    the hierarchical one is much saner for us humans, and that's the<br>

    main point of assembly.  It wouldn't be hard to default to one and<br>

    write the other based on a command-line flag -- is that a good idea?<br></blockquote><div><br>I don't think the flat form will be particularly more compelling for testing. We test the hierarchical DWARF dumping all the time - doing so pedantically does require littering CHECK-NOT: DW_TAG|NULL in a fair few places, but even without those the tests aren't /too/ bad.<br><br>If checking hierarchical data is a problem, chances are we just want to improve FileCheck until it isn't, since we already have hierarchies we have to check.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

 3. Assembly syntax is pretty easy to change, so this doesn't have to be<br>

    perfect now.  Nevertheless, is there a magical syntax that would be<br>

    easier to read/write/FileCheck?<br></blockquote><div><br>Presumably when dumping the fields will come in a specific, defined order? (probably not preserving the original source order, etc) Variation there would probably make checking harder than it needs to be.<br><br>Would it be possible to omit the names of unreferenced nested metadata? (if you have a bunch of member functions in a class, but don't need to refer to them elsewhere (eg: those member functions aren't defined in this translation unit)) - that'd help readability/writeability, but probably wouldn't impact FileCheck.<br><br>Also the whole "metadata " prefix on anything is a bit verbose, if we can omit that in some/many places, that'll help reduce the visual noise/improve readability/writeability.<br><br>But most/all of that I imagine is fairly easily incremental change/improvement, nothing fundamental that needs to be chosen up-front.<br> </div></div><br></div></div>