<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 15, 2014 at 2:31 PM, Eric Christopher <span dir="ltr"><<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

><br>

><br>

> On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <<a href="mailto:echristo@gmail.com">echristo@gmail.com</a>><br>

> wrote:<br>

>><br>

>> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

>> > For those interested, I've attached some pie charts based on Duncan's<br>

>> > data<br>

>> > in one of the other posts; successive slides break down the usage<br>

>> > increasingly finely. To my understanding, they represent the number of<br>

>> > Value's (and subclasses) allocated.<br>

>> ><br>

>> > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith<br>

>> > <<a href="mailto:dexonsmith@apple.com">dexonsmith@apple.com</a>> wrote:<br>

>> >><br>

>> >> In r219010, I merged integer and string fields into a single header<br>

>> >> field.  By reducing the number of metadata operands used in debug info,<br>

>> >> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done some profiling<br>

>> >> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and<br>

>> >> I've concluded that they will be insufficient.<br>

>> >><br>

>> >> Instead, I'd like to implement a more aggressive plan, which as a<br>

>> >> side-effect cleans up the much "loved" debug info IR assembly syntax.<br>

>> >><br>

>> >> At a high-level, the idea is to create distinct subclasses of `Value`<br>

>> >> for each debug info concept, starting with line table entries and<br>

>> >> moving<br>

>> >> on to the DIDescriptor hierarchy.  By leveraging the use-list<br>

>> >> infrastructure for metadata operands -- i.e., only using value handles<br>

>> >> for non-metadata operands -- we'll improve memory usage and increase<br>

>> >> RAUW speed.<br>

>> >><br>

>> >> My rough plan follows.  I quote some numbers for memory savings below<br>

>> >> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto`<br>

>> >> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's<br>

>> >> -save-temps option) that currently peaks at 15.3GB.<br>

>> ><br>

>> ><br>

>> > Stupid question, but when I was working on LTO last Summer the primary<br>

>> > culprit for excessive memory use was due to us not being smart when<br>

>> > linking<br>

>> > the IR together (Espindola would know more details). Do we still have<br>

>> > that<br>

>> > problem? For starters, how does the memory usage of just llvm-link<br>

>> > compare<br>

>> > to the memory usage of the actual LTO run? If the issue I was seeing<br>

>> > last<br>

>> > Summer is still there, you should see that the invocation of llvm-link<br>

>> > is<br>

>> > actually the most memory-intensive part of the LTO step, by far.<br>

>> ><br>

>><br>

>> This is vague. Could you be more specific on where you saw all of the<br>

>> memory?<br>

><br>

><br>

> Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g;<br>

> without -g it completed with much less). The increasing could be easily<br>

> watched on the system "process monitor" in real time.<br>

><br>

<br>

</div></div>This is likely what we've already discussed and was handled a long<br>

while ago now.<br>

<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div>I was reading the thread in sequential order (and replying without finishing). derp.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888">

-eric<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

> -- Sean Silva<br>

><br>

>><br>

>><br>

>> -eric<br>

>><br>

>> ><br>

>> > Also, you seem to really like saying "peak" here. Is there a definite<br>

>> > peak?<br>

>> > When does it occur?<br>

>> ><br>

>> ><br>

>> >><br>

>> >><br>

>> >>  1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s<br>

>> >>     must all be metadata.  The cost per operand is 1 pointer, vs. 4<br>

>> >>     pointers in an `MDNode`.<br>

>> >><br>

>> >>  2. Create `MDLineTable` as the first subclass of `MDUser`.  Use normal<br>

>> >>     fields (not `Value`s) for the line and column, and use `Use`<br>

>> >>     operands for the metadata operands.<br>

>> >><br>

>> >>     On x86-64, this will save 104B / line table entry.  Linking<br>

>> >>     `llvm-lto` uses ~7M line-table entries, so this on its own saves<br>

>> >>     ~700MB.<br>

>> >><br>

>> >><br>

>> >>     Sketch of class definition:<br>

>> >><br>

>> >>         class MDLineTable : public MDUser {<br>

>> >>           unsigned Line;<br>

>> >>           unsigned Column;<br>

>> >>         public:<br>

>> >>           static MDLineTable *get(unsigned Line, unsigned Column,<br>

>> >>                                   MDNode *Scope);<br>

>> >>           static MDLineTable *getInlined(MDLineTable *Base, MDNode<br>

>> >> *Scope);<br>

>> >>           static MDLineTable *getBase(MDLineTable *Inlined);<br>

>> >><br>

>> >>           unsigned getLine() const { return Line; }<br>

>> >>           unsigned getColumn() const { return Column; }<br>

>> >>           bool isInlined() const { return getNumOperands() == 2; }<br>

>> >>           MDNode *getScope() const { return getOperand(0); }<br>

>> >>           MDNode *getInlinedAt() const { return getOperand(1); }<br>

>> >>         };<br>

>> >><br>

>> >>     Proposed assembly syntax:<br>

>> >><br>

>> >>         ; Not inlined.<br>

>> >>         !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata<br>

>> >> !9)<br>

>> >><br>

>> >>         ; Inlined.<br>

>> >>         !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata<br>

>> >> !9,<br>

>> >>                                    inlinedAt: metadata !10)<br>

>> >><br>

>> >>         ; Column defaulted to 0.<br>

>> >>         !7 = metadata !MDLineTable(line: 45, scope: metadata !9)<br>

>> >><br>

>> >>     (What colour should that bike shed be?)<br>

>> >><br>

>> >>  3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling shows<br>

>> >>     that we have 3.5M entries in the `DebugLoc` side-vectors for 7M<br>

>> >> line<br>

>> >>     table entries.  The cost of these is ~180B each, for another<br>

>> >>     ~600MB.<br>

>> >><br>

>> >>     If we integrate a side-table of `MDLineTable`s into its uniquing,<br>

>> >>     the overhead is only ~12B / line table entry, or ~80MB.  This saves<br>

>> >>     520MB.<br>

>> >><br>

>> >>     This is somewhat perpendicular to redesigning the metadata format,<br>

>> >>     but IMO it's worth doing as soon as it's possible.<br>

>> >><br>

>> >>  4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`<br>

>> >>     through an intermediate class `DebugMDNode` with an<br>

>> >>     allocation-time-optional `CallbackVH` available for referencing<br>

>> >>     non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode`<br>

>> >> instead<br>

>> >>     of an `MDNode`.<br>

>> >><br>

>> >>     This saves another ~960MB, for a running total of ~2GB.<br>

>> ><br>

>> ><br>

>> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have<br>

>> > a<br>

>> > single pie slice near 40% of the # of Value's allocated and another at<br>

>> > 21%.<br>

>> > Especially this being "step 4".<br>

>> ><br>

>> > As a rough back of the envelope calculation, dividing 15.3GB by ~24<br>

>> > million<br>

>> > Values gives about 600 bytes per Value. That seems sort of excessive<br>

>> > (but is<br>

>> > it realistic?). All of the data types that you are proposing to shrink<br>

>> > fall<br>

>> > far short of this "average size", meaning that if you are trying to<br>

>> > reduce<br>

>> > memory usage, you might be looking in the wrong place. Something smells<br>

>> > fishy. At the very least, this would indicate that the real memory usage<br>

>> > is<br>

>> > elsewhere.<br>

>> ><br>

>> > A pie chart breaking down the total memory usage seems essential to have<br>

>> > here.<br>

>> ><br>

>> >><br>

>> >><br>

>> >>     Proposed assembly syntax:<br>

>> >><br>

>> >>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,<br>

>> >>                                           fields: "0\00clang<br>

>> >> 3.6\00...",<br>

>> >>                                           operands: { metadata !8, ...<br>

>> >> })<br>

>> >><br>

>> >>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,<br>

>> >>                                           fields: "global_var\00...",<br>

>> >>                                           operands: { metadata !8, ...<br>

>> >> },<br>

>> >>                                           handle: i32* @global_var)<br>

>> >><br>

>> >>     This syntax pulls the tag out of the current header-string, calls<br>

>> >>     the rest of the header "fields", and includes the metadata operands<br>

>> >>     in "operands".<br>

>> >><br>

>> >>  5. Incrementally create subclasses of `DebugMDNode`, such as<br>

>> >>     `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes replace the<br>

>> >>     "fields" and "operands" catch-alls with explicit names for each<br>

>> >>     operand.<br>

>> >><br>

>> >>     Proposed assembly syntax:<br>

>> >><br>

>> >>         !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName:<br>

>> >> "foo",<br>

>> >>                                     linkageName: "_Z3foov", file:<br>

>> >> metadata<br>

>> >> !8,<br>

>> >>                                     function: i32 (i32)* @foo)<br>

>> >><br>

>> >>  6. Remove the dead code for `GenericDebugMDNode`.<br>

>> >><br>

>> >>  7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW<br>

>> >>     traffic during bitcode serialization.  Now that metadata types are<br>

>> >>     known, we can write debug info out in an order that makes it cheap<br>

>> >>     to read back in.<br>

>> >><br>

>> >>     Note that using `MDUser` will make RAUW much cheaper, since we're<br>

>> >>     using the use-list infrastructure for most of them.  If RAUW isn't<br>

>> >>     showing up in a profile, I may skip this.<br>

>> >><br>

>> >> Does this direction seem reasonable?  Any major problems I've missed?<br>

>> ><br>

>> ><br>

>> > You need more data. Right now you have essentially one data point, and<br>

>> > it's<br>

>> > not even clear what you measured really. If your goal is saving memory,<br>

>> > I<br>

>> > would expect at least a pie chart that breaks down LLVM's memory usage<br>

>> > (not<br>

>> > just # of allocations of different sorts; an approximation is fine, as<br>

>> > long<br>

>> > as you explain how you arrived at it and in what sense it approximates<br>

>> > the<br>

>> > true number).<br>

>> ><br>

>> > Do the numbers change significantly for different projects? (e.g.<br>

>> > Chromium<br>

>> > or Firefox or a kernel or a large app you have handy to compile with<br>

>> > LTO?).<br>

>> > If you have specific data you want (and a suggestion for how to gather<br>

>> > it),<br>

>> > I can also get your numbers for one of our internal games as well.<br>

>> ><br>

>> > Once you have some more data, then as a first step, I would like to see<br>

>> > an<br>

>> > analysis of how much we can "ideally" expect to gain (back of the<br>

>> > envelope<br>

>> > calculations == win).<br>

>> ><br>

>> > -- Sean Silva<br>

>> ><br>

>> >><br>

>> >><br>

>> >> _______________________________________________<br>

>> >> LLVM Developers mailing list<br>

>> >> <a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

>> >> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

>> ><br>

>> ><br>

><br>

><br>

</div></div></blockquote></div><br></div></div>