<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 13, 2014 at 4:30 PM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

> On Oct 13, 2014, at 3:23 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br>

><br>

><br>

><br>

> On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith <<a href="mailto:dexonsmith@apple.com">dexonsmith@apple.com</a>> wrote:<br>

>> In r219010, I merged integer and string fields into a single header<br>

>> field.  By reducing the number of metadata operands used in debug info,<br>

>> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done some profiling<br>

>> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and<br>

>> I've concluded that they will be insufficient.<br>

>><br>

> Could you explain what your end-goal here looked like and what data you used to evaluate its insufficiency?<br>

<br>

</span>In the links of C++ programs I've looked at, most `Value`s are line<br>

tables and local variables.  E.g., for the `llvm-lto.lto.bc` case<br>

I've used for memory numbers:<br>

<br>

  - 23967800 Value<br>

      - 16837368 MDNode<br>

          - 7611669 DIDescriptor<br>

              - 4373879 DW_TAG_arg_variable<br>

              - 1341021 DW_TAG_subprogram<br>

              -  554992 DW_TAG_auto_variable<br>

              -  360390 DW_TAG_lexical_block<br>

              -  354166 DW_TAG_subroutine_type<br>

          - 7500000 line table entries<br>

      -  5850877 User<br>

      -   693869 MDString<br>

<br></blockquote><div><br></div><div>I would like to see the same thing, but where the numbers indicate total memory used in each category, instead of the count of entries in each category.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

IIUC, line tables and local variables need to be referenced directly<br>

from the rest of the IR, so they can't be sunk into other nodes.<br>

<br>

Relevant to your question, I didn't a way to sufficiently decrease<br>

the numbers of these (or the number of their operands).<br>

<span class=""><br>

> Just to be clear, what I was picturing was that, starting with your initial improvement, we'd string-ify more data in the records but eventually we'd start stringifying across records (eg: rolling a DW_TAG_structure_type's members into the structure type itself, one big string). In the end we'd just pull out the non-metadata references (like the llvm::Function* in the DW_TAG_subroutine_type metadata) into a table kept separately from a handful of big strings of debug info (I say a handful, as we'd keep the types separate so they could be easily deduplicated).<br>

<br>

</span>I was thinking along the same lines.  Unfortunately, there aren't<br>

enough types left for that to make a big impact.<br>

<br>

Unless you envisioned a completely different way of dealing with<br>

`@llvm.dbg.value` and `!dbg` references?<br>

<span class=""><br>

>> Instead, I'd like to implement a more aggressive plan, which as a<br>

>> side-effect cleans up the much "loved" debug info IR assembly syntax.<br>

>><br>

>> At a high-level, the idea is to create distinct subclasses of `Value`<br>

>> for each debug info concept,<br>

><br>

> My concern with this is baking parts of our current debug info representation into IR constructs seems rather heavyweight. If we need to add first class IR constructs to cope with debug info I'd hope to find, ideally, one, general purpose extension we can use for this (& possibly for other things). But maybe the bar for adding first class IR constructs is lower than I've imagined it to be.<br>

<br>

</span>Since 75% of all `Value`s are debug info, representing them well<br>

seems worthwhile to me.<br>

<span class=""><br>

>> starting with line table entries and moving<br>

>> on to the DIDescriptor hierarchy.  By leveraging the use-list<br>

>> infrastructure for metadata operands -- i.e., only using value handles<br>

>> for non-metadata operands -- we'll improve memory usage and increase<br>

>> RAUW speed.<br>

>><br>

>> My rough plan follows.<br>

<br>

</span>(Note the following sentence, which I think you missed.)<br>

<span class=""><br>

>> I quote some numbers for memory savings below<br>

>> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto`<br>

>> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's<br>

>> -save-temps option) that currently peaks at 15.3GB.<br>

>><br>

>>  1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s<br>

>>     must all be metadata.  The cost per operand is 1 pointer, vs. 4<br>

>>     pointers in an `MDNode`.<br>

><br>

> Perhaps a generic MD-only-node might be a sufficiently generically valuable IR construct.<br>

><br>

> A similar alternative: A schematized metadata node. Much like DWARF, being able to say "this node is of some type T, defined elsewhere in the module - string, int, string, string, etc... ". Heck, this could even be just a generic improvement to llvm IR, maybe? (the textual representation might not need to change at all - IR Generation would just do much like DWARF generation in LLVM does - create abbreviation/type descriptions on the fly and share them rather than having every metadata node include its own self-description)<br>

><br>

<br>

</span>"Being generic" seems like a defect to me, not a feature.  If you need<br>

to add support for every IR construct to the backend to emit DIEs, etc.,<br>

then what's the benefit in being able to express arbitrary other things?<br>

<div><div class="h5"><br>

<br>

>>  2. Create `MDLineTable` as the first subclass of `MDUser`.  Use normal<br>

>>     fields (not `Value`s) for the line and column, and use `Use`<br>

>>     operands for the metadata operands.<br>

>><br>

>>     On x86-64, this will save 104B / line table entry.  Linking<br>

>>     `llvm-lto` uses ~7M line-table entries, so this on its own saves<br>

>>     ~700MB.<br>

>><br>

>>     Sketch of class definition:<br>

>><br>

>>         class MDLineTable : public MDUser {<br>

>>           unsigned Line;<br>

>>           unsigned Column;<br>

>>         public:<br>

>>           static MDLineTable *get(unsigned Line, unsigned Column,<br>

>>                                   MDNode *Scope);<br>

>>           static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope);<br>

>>           static MDLineTable *getBase(MDLineTable *Inlined);<br>

>><br>

>>           unsigned getLine() const { return Line; }<br>

>>           unsigned getColumn() const { return Column; }<br>

>>           bool isInlined() const { return getNumOperands() == 2; }<br>

>>           MDNode *getScope() const { return getOperand(0); }<br>

>>           MDNode *getInlinedAt() const { return getOperand(1); }<br>

>>         };<br>

>><br>

>>     Proposed assembly syntax:<br>

>><br>

>>         ; Not inlined.<br>

>>         !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9)<br>

>><br>

>>         ; Inlined.<br>

>>         !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9,<br>

>>                                    inlinedAt: metadata !10)<br>

>><br>

>>         ; Column defaulted to 0.<br>

>>         !7 = metadata !MDLineTable(line: 45, scope: metadata !9)<br>

>><br>

>>     (What colour should that bike shed be?)<br>

>><br>

>>  3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling shows<br>

>>     that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line<br>

>>     table entries.  The cost of these is ~180B each, for another<br>

>>     ~600MB.<br>

>><br>

>>     If we integrate a side-table of `MDLineTable`s into its uniquing,<br>

>>     the overhead is only ~12B / line table entry, or ~80MB.  This saves<br>

>>     520MB.<br>

>><br>

>>     This is somewhat perpendicular to redesigning the metadata format,<br>

>>     but IMO it's worth doing as soon as it's possible.<br>

>><br>

>>  4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`<br>

>>     through an intermediate class `DebugMDNode` with an<br>

>>     allocation-time-optional `CallbackVH` available for referencing<br>

>>     non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode` instead<br>

>>     of an `MDNode`.<br>

>><br>

>>     This saves another ~960MB,<br>

><br>

> 960 from what?<br>

<br>

</div></div>This number references the sentence noted above.<br>

<span class=""><br>

><br>

>> for a running total of ~2GB.<br>

><br>

> ~2GB is the total of what? (you mention a lot of numbers in this post, but it's not always clear what they're relative to/out of/subtracted from)<br>

<br>

</span>This number references the sentence noted above.<br>

<span class=""><br>

>><br>

>>     Proposed assembly syntax:<br>

>><br>

>>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,<br>

>>                                           fields: "0\00clang 3.6\00...",<br>

>>                                           operands: { metadata !8, ... })<br>

>><br>

>>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,<br>

>>                                           fields: "global_var\00...",<br>

>>                                           operands: { metadata !8, ... },<br>

>>                                           handle: i32* @global_var)<br>

>><br>

>>     This syntax pulls the tag out of the current header-string, calls<br>

>>     the rest of the header "fields", and includes the metadata operands<br>

>>     in "operands".<br>

>><br>

>>  5. Incrementally create subclasses of `DebugMDNode`, such as<br>

>>     `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes replace the<br>

>>     "fields" and "operands" catch-alls with explicit names for each<br>

>>     operand.<br>

><br>

> I wouldn't mind seeing how expensive it would be if these schema descriptions were within the module itself - so we didn't have to bake them into the IR spec, but could still share them between every usage within a module.<br>

<br>

</span>It's already baked into the IR spec, since the backend needs to<br>

understand debug info to emit it.  We might as well understand what<br>

exactly we're representing by formalizing it.<br>

<span class="im HOEnZb"><br>

><br>

>><br>

>>     Proposed assembly syntax:<br>

>><br>

>>         !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: "foo",<br>

>>                                     linkageName: "_Z3foov", file: metadata !8,<br>

>>                                     function: i32 (i32)* @foo)<br>

>><br>

>>  6. Remove the dead code for `GenericDebugMDNode`.<br>

>><br>

>>  7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW<br>

>>     traffic during bitcode serialization.  Now that metadata types are<br>

>>     known, we can write debug info out in an order that makes it cheap<br>

>>     to read back in.<br>

>><br>

>>     Note that using `MDUser` will make RAUW much cheaper, since we're<br>

>>     using the use-list infrastructure for most of them.  If RAUW isn't<br>

>>     showing up in a profile, I may skip this.<br>

>><br>

>> Does this direction seem reasonable?  Any major problems I've missed?<br>

</span><div class="HOEnZb"><div class="h5">_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</div></div></blockquote></div><br></div></div>