[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

Mon Oct 13 19:01:58 PDT 2014

On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at gmail.com> wrote:
> For those interested, I've attached some pie charts based on Duncan's data
> in one of the other posts; successive slides break down the usage
> increasingly finely. To my understanding, they represent the number of
> Value's (and subclasses) allocated.
>
> On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>>
>> In r219010, I merged integer and string fields into a single header
>> field.  By reducing the number of metadata operands used in debug info,
>> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done some profiling
>> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and
>> I've concluded that they will be insufficient.
>>
>> Instead, I'd like to implement a more aggressive plan, which as a
>> side-effect cleans up the much "loved" debug info IR assembly syntax.
>>
>> At a high-level, the idea is to create distinct subclasses of `Value`
>> for each debug info concept, starting with line table entries and moving
>> on to the DIDescriptor hierarchy.  By leveraging the use-list
>> infrastructure for metadata operands -- i.e., only using value handles
>> for non-metadata operands -- we'll improve memory usage and increase
>> RAUW speed.
>>
>> My rough plan follows.  I quote some numbers for memory savings below
>> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto`
>> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's
>> -save-temps option) that currently peaks at 15.3GB.
>
>
> Stupid question, but when I was working on LTO last Summer the primary
> culprit for excessive memory use was due to us not being smart when linking
> the IR together (Espindola would know more details). Do we still have that
> problem? For starters, how does the memory usage of just llvm-link compare
> to the memory usage of the actual LTO run? If the issue I was seeing last
> Summer is still there, you should see that the invocation of llvm-link is
> actually the most memory-intensive part of the LTO step, by far.
>

This is vague. Could you be more specific on where you saw all of the memory?

-eric

>
> Also, you seem to really like saying "peak" here. Is there a definite peak?
> When does it occur?
>
>
>>
>>
>>  1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s
>>     must all be metadata.  The cost per operand is 1 pointer, vs. 4
>>     pointers in an `MDNode`.
>>
>>  2. Create `MDLineTable` as the first subclass of `MDUser`.  Use normal
>>     fields (not `Value`s) for the line and column, and use `Use`
>>     operands for the metadata operands.
>>
>>     On x86-64, this will save 104B / line table entry.  Linking
>>     `llvm-lto` uses ~7M line-table entries, so this on its own saves
>>     ~700MB.
>>
>>
>>     Sketch of class definition:
>>
>>         class MDLineTable : public MDUser {
>>           unsigned Line;
>>           unsigned Column;
>>         public:
>>           static MDLineTable *get(unsigned Line, unsigned Column,
>>                                   MDNode *Scope);
>>           static MDLineTable *getInlined(MDLineTable *Base, MDNode
>> *Scope);
>>           static MDLineTable *getBase(MDLineTable *Inlined);
>>
>>           unsigned getLine() const { return Line; }
>>           unsigned getColumn() const { return Column; }
>>           bool isInlined() const { return getNumOperands() == 2; }
>>           MDNode *getScope() const { return getOperand(0); }
>>           MDNode *getInlinedAt() const { return getOperand(1); }
>>         };
>>
>>     Proposed assembly syntax:
>>
>>         ; Not inlined.
>>         !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata
>> !9)
>>
>>         ; Inlined.
>>         !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata
>> !9,
>>                                    inlinedAt: metadata !10)
>>
>>         ; Column defaulted to 0.
>>         !7 = metadata !MDLineTable(line: 45, scope: metadata !9)
>>
>>     (What colour should that bike shed be?)
>>
>>  3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling shows
>>     that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line
>>     table entries.  The cost of these is ~180B each, for another
>>     ~600MB.
>>
>>     If we integrate a side-table of `MDLineTable`s into its uniquing,
>>     the overhead is only ~12B / line table entry, or ~80MB.  This saves
>>     520MB.
>>
>>     This is somewhat perpendicular to redesigning the metadata format,
>>     but IMO it's worth doing as soon as it's possible.
>>
>>  4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`
>>     through an intermediate class `DebugMDNode` with an
>>     allocation-time-optional `CallbackVH` available for referencing
>>     non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode` instead
>>     of an `MDNode`.
>>
>>     This saves another ~960MB, for a running total of ~2GB.
>
>
> 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have a
> single pie slice near 40% of the # of Value's allocated and another at 21%.
> Especially this being "step 4".
>
> As a rough back of the envelope calculation, dividing 15.3GB by ~24 million
> Values gives about 600 bytes per Value. That seems sort of excessive (but is
> it realistic?). All of the data types that you are proposing to shrink fall
> far short of this "average size", meaning that if you are trying to reduce
> memory usage, you might be looking in the wrong place. Something smells
> fishy. At the very least, this would indicate that the real memory usage is
> elsewhere.
>
> A pie chart breaking down the total memory usage seems essential to have
> here.
>
>>
>>
>>     Proposed assembly syntax:
>>
>>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,
>>                                           fields: "0\00clang 3.6\00...",
>>                                           operands: { metadata !8, ... })
>>
>>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
>>                                           fields: "global_var\00...",
>>                                           operands: { metadata !8, ... },
>>                                           handle: i32* @global_var)
>>
>>     This syntax pulls the tag out of the current header-string, calls
>>     the rest of the header "fields", and includes the metadata operands
>>     in "operands".
>>
>>  5. Incrementally create subclasses of `DebugMDNode`, such as
>>     `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes replace the
>>     "fields" and "operands" catch-alls with explicit names for each
>>     operand.
>>
>>     Proposed assembly syntax:
>>
>>         !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName:
>> "foo",
>>                                     linkageName: "_Z3foov", file: metadata
>> !8,
>>                                     function: i32 (i32)* @foo)
>>
>>  6. Remove the dead code for `GenericDebugMDNode`.
>>
>>  7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
>>     traffic during bitcode serialization.  Now that metadata types are
>>     known, we can write debug info out in an order that makes it cheap
>>     to read back in.
>>
>>     Note that using `MDUser` will make RAUW much cheaper, since we're
>>     using the use-list infrastructure for most of them.  If RAUW isn't
>>     showing up in a profile, I may skip this.
>>
>> Does this direction seem reasonable?  Any major problems I've missed?
>
>
> You need more data. Right now you have essentially one data point, and it's
> not even clear what you measured really. If your goal is saving memory, I
> would expect at least a pie chart that breaks down LLVM's memory usage (not
> just # of allocations of different sorts; an approximation is fine, as long
> as you explain how you arrived at it and in what sense it approximates the
> true number).
>
> Do the numbers change significantly for different projects? (e.g. Chromium
> or Firefox or a kernel or a large app you have handy to compile with LTO?).
> If you have specific data you want (and a suggestion for how to gather it),
> I can also get your numbers for one of our internal games as well.
>
> Once you have some more data, then as a first step, I would like to see an
> analysis of how much we can "ideally" expect to gain (back of the envelope
> calculations == win).
>
> -- Sean Silva
>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>