<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I've attached a preliminary patch for `MDLocation` as a follow-up to the<br>
RFC [1] last week. It's not commit-ready -- in particular, it squashes<br>
a bunch of commits together and doesn't pass `make check` -- but I think<br>
it's close enough to indicate the direction and work toward consensus.<br>
<br>
[1]: <a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/077715.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/077715.html</a><br>
<br>
IMO, the files to focus on are:<br>
<br>
include/llvm/IR/DebugInfo.h<br>
include/llvm/IR/DebugLoc.h<br>
include/llvm/IR/Metadata.h<br>
include/llvm/IR/Value.h<br>
lib/AsmParser/LLLexer.cpp<br>
lib/AsmParser/LLParser.cpp<br>
lib/AsmParser/LLParser.h<br>
lib/AsmParser/LLToken.h<br>
lib/Bitcode/Reader/BitcodeReader.cpp<br>
lib/Bitcode/Writer/BitcodeWriter.cpp<br>
lib/Bitcode/Writer/ValueEnumerator.cpp<br>
lib/Bitcode/Writer/ValueEnumerator.h<br>
lib/IR/AsmWriter.cpp<br>
lib/IR/AsmWriter.h<br>
lib/IR/DebugInfo.cpp<br>
lib/IR/DebugLoc.cpp<br>
lib/IR/LLVMContextImpl.cpp<br>
lib/IR/LLVMContextImpl.h<br>
lib/IR/Metadata.cpp<br>
<br>
Using `Value` instead of `MDNode`<br>
=================================<br>
<br>
A number of APIs expect `MDNode` -- previously, the only referenceable<br>
type of metadata -- but this patch (and the ones that will follow) have<br>
referenceable metadata that *do not* inherit from `MDNode`. Metadata<br>
APIs such as `Instruction::getMetadata()` and<br>
`NamedMDNode::getOperand()` need to return non-`MDNode` metadata.<br>
<br>
I plan to commit the API changes incrementally so we can fix any issues<br>
there before pushing the functionality changes. Unfortunately, this<br>
currently adds a lot of noise to the (squashed) patch.<br>
<br>
Introducing `MDLocation`<br>
========================<br>
<br>
Of course, this adds `MDLocation`, the first subclass of `MDUser`. This<br>
is a first-class IR type that has two other representations:<br>
`DILocation` (which now trivially wraps `MDLocation` instead of<br>
`MDNode`) and `DebugLoc`.<br>
<br>
I've genericised the code in `LLParser` (and elsewhere) to sketch out<br>
how adding other `MDUser` subclasses will go. Perhaps I used the wrong<br>
axis, but we can adjust it as we go.<br>
<br>
Usage examples:<br>
<br>
!6 = metadata MDLocation(line: 43, column: 7, scope: !4)<br>
!7 = metadata MDLocation(scope: !5, line: 67, inlinedAt: !6)<br>
<br>
The fields can be listed in any order. The `scope:` field is required,<br>
but the others are optional (`line:` and `column:` default to `0`,<br>
`inlinedAt:` defaults to `null`).<br>
<br>
(Note that in the RFC I referred to this as an `MDLineTable`, but<br>
`MDLocation` is a better name. If/when this work supersedes the<br>
`DIDescriptor` hierarchy, it'll likely get renamed to `DILocation`, but<br>
for now there's a name clash.)<br>
<br>
Where this is heading<br>
=====================<br>
<br>
Let's look at a concrete example. Here's some simple C++ code:<br>
<br>
$ cat t.h<br>
struct T { short a; short b; };<br>
$ cat foo.cpp<br>
#include "t.h"<br>
int foo(T t) { return t.a + t.b; }<br>
$ cat bar.cpp<br>
#include "t.h"<br>
int foo(T t);<br>
int bar(T t) { return foo(t) * 2; }<br>
<br>
Looking forward, after refactoring ownership and uniquing and fixing up<br>
a few schema issues, I'd expect the above to link into something like<br>
the following:<br>
<br>
!0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")<br>
!1 = metadata DIFile(filename: "./t.h", directory: "/path/to")<br>
!2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")<br>
!3 = metadata DIBaseType(name: "short", size: 16, align: 16)<br>
!5 = metadata DIBaseType(name: "int", size: 32, align: 32)<br>
!6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",<br>
file: !1, line: 1, size: 32, align: 16)<br>
!7 = metadata DIMember(line: 1, file: !1, type: !3,<br>
name: "a", size: 16, align: 16, context: !6)<br>
!8 = metadata DIMember(line: 1, file: !1, type: !3,<br>
name: "b", size: 16, align: 16, context: !6)<br>
!9 = metadata DISubroutineType(args: [ !5, !6 ])<br>
!10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,<br>
producer: "clang version 3.6.0 ",<br>
retainedUniqueTypes: [ !6 ])<br>
!11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",<br>
handle: i32(i32)* @_Z3foo1T, file: !0,<br>
type: !9, context: !10)<br>
!12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6,<br>
context: !11)<br>
!13 = metadata DILocation(line: 2, column: 11, scope: !11)<br>
!14 = metadata DILocation(line: 2, column: 16, scope: !11)<br>
!15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,<br>
producer: "clang version 3.6.0 ",<br>
retainedUniqueTypes: [ !6 ])<br>
!16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",<br>
handle: i32 (i32)* @_Z3bar1T, file: !2,<br>
type: !9, context: !15)<br>
!17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6,<br>
context: !16)<br>
!18 = metadata DILocation(line: 3, column: 11, scope: !16)<br>
!19 = metadata DILocation(line: 3, column: 23, scope: !16)<br>
<br>
Notice that only the links to parents (i.e., `context:`) are explicit<br>
here -- backlinks are implied. For example, !7 and !8 point to !6, but<br>
not the reverse.<br></blockquote><div><br>This may be a problem - the difference between nodes in a structure/class_type's member list, and those that are not in the member list but refer to the class/structure as their parent is meaningful. Type units use this distinction to avoid emitting instantiation-specific data into the canonical type unit (nested classes, implicit special members, member template instantiations, etc).<br><br>Not sure what the right answer will be there.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
This has the interesting property of removing all cycles from<br>
serialization (assembly and bitcode).<br>
<br>
Making debug info assembly readable and writable<br>
================================================<br>
<br>
Moreover, we're now in a place where it's trivial to express the<br>
"context" pointer structurally. Here's the same debug info as above,<br>
using syntactic sugar to fill the "context" pointers:<br>
<br>
!0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")<br>
!1 = metadata DIFile(filename: "./t.h", directory: "/path/to")<br>
!2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")<br>
!3 = metadata DIBaseType(name: "short", size: 16, align: 16)<br>
!5 = metadata DIBaseType(name: "int", size: 32, align: 32)<br>
!6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",<br>
file: !1, line: 1, size: 32, align: 16) {<br>
!7 = metadata DIMember(line: 1, file: !1, type: !3,<br>
name: "a", size: 16, align: 16)<br>
!8 = metadata DIMember(line: 1, file: !1, type: !3,<br>
name: "b", size: 16, align: 16)<br>
} ; !6<br>
!9 = metadata DISubroutineType(args: [ !5, !6 ])<br>
!10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,<br>
producer: "clang version 3.6.0 ",<br>
retainedUniqueTypes: [ !6 ]) {<br>
!11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",<br>
handle: i32(i32)* @_Z3foo1T, file: !0,<br>
type: !9) {<br>
!12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6)<br>
!13 = metadata DILocation(line: 2, column: 11)<br>
!14 = metadata DILocation(line: 2, column: 16)<br>
} ; !11<br>
} ; !10<br>
!15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,<br>
producer: "clang version 3.6.0 ",<br>
retainedUniqueTypes: [ !6 ]) {<br>
!16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",<br>
handle: i32 (i32)* @_Z3bar1T, file: !2,<br>
type: !9) {<br>
!17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6)<br>
!18 = metadata DILocation(line: 3, column: 11)<br>
!19 = metadata DILocation(line: 3, column: 23)<br>
} ; !16<br>
} ; !15<br>
<br>
This assembly has the following advantages over the status quo:<br>
<br>
- Fields are named. Aside from readability, this prevents<br>
adding/reordering fields in the schema from requiring testcase<br>
updates.<br>
<br>
- Serialization graph becomes a DAG. Aside from readability, this<br>
removes most RAUW from assembly (and all RAUW from bitcode).<br>
<br>
- Structure is clear.<br>
<br>
Bike sheds to paint<br>
===================<br>
<br>
1. Should we trim some boilerplate? E.g., it would be trivial to<br>
change:<br>
<br>
!6 = metadata MDLocation(line: 43, column: 7, scope: !4)<br>
<br>
to:<br>
<br>
!6 = MDLocation(line: 43, column: 7, scope: !4)<br>
<br>
This would not complicate `LLParser`. Thoughts?<br>
<br>
2. Which of the two "end goal" syntaxes is better: flat, or<br>
hierarchical? Better for what? Why?<br>
<br>
The flat one might be better for FileCheck-ing (not sure), but IMO<br>
the hierarchical one is much saner for us humans, and that's the<br>
main point of assembly. It wouldn't be hard to default to one and<br>
write the other based on a command-line flag -- is that a good idea?<br></blockquote><div><br>I don't think the flat form will be particularly more compelling for testing. We test the hierarchical DWARF dumping all the time - doing so pedantically does require littering CHECK-NOT: DW_TAG|NULL in a fair few places, but even without those the tests aren't /too/ bad.<br><br>If checking hierarchical data is a problem, chances are we just want to improve FileCheck until it isn't, since we already have hierarchies we have to check.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
3. Assembly syntax is pretty easy to change, so this doesn't have to be<br>
perfect now. Nevertheless, is there a magical syntax that would be<br>
easier to read/write/FileCheck?<br></blockquote><div><br>Presumably when dumping the fields will come in a specific, defined order? (probably not preserving the original source order, etc) Variation there would probably make checking harder than it needs to be.<br><br>Would it be possible to omit the names of unreferenced nested metadata? (if you have a bunch of member functions in a class, but don't need to refer to them elsewhere (eg: those member functions aren't defined in this translation unit)) - that'd help readability/writeability, but probably wouldn't impact FileCheck.<br><br>Also the whole "metadata " prefix on anything is a bit verbose, if we can omit that in some/many places, that'll help reduce the visual noise/improve readability/writeability.<br><br>But most/all of that I imagine is fairly easily incremental change/improvement, nothing fundamental that needs to be chosen up-front.<br> </div></div><br></div></div>