[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
Sean Silva
chisophugis at gmail.com
Mon Oct 13 18:59:48 PDT 2014
For those interested, I've attached some pie charts based on Duncan's data
in one of the other posts; successive slides break down the usage
increasingly finely. To my understanding, they represent the number of
Value's (and subclasses) allocated.
On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
> In r219010, I merged integer and string fields into a single header
> field. By reducing the number of metadata operands used in debug info,
> this saved 2.2GB on an `llvm-lto` bootstrap. I've done some profiling
> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and
> I've concluded that they will be insufficient.
>
> Instead, I'd like to implement a more aggressive plan, which as a
> side-effect cleans up the much "loved" debug info IR assembly syntax.
>
> At a high-level, the idea is to create distinct subclasses of `Value`
> for each debug info concept, starting with line table entries and moving
> on to the DIDescriptor hierarchy. By leveraging the use-list
> infrastructure for metadata operands -- i.e., only using value handles
> for non-metadata operands -- we'll improve memory usage and increase
> RAUW speed.
>
> My rough plan follows. I quote some numbers for memory savings below
> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto`
> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's
> -save-temps option) that currently peaks at 15.3GB.
>
Stupid question, but when I was working on LTO last Summer the primary
culprit for excessive memory use was due to us not being smart when linking
the IR together (Espindola would know more details). Do we still have that
problem? For starters, how does the memory usage of just llvm-link compare
to the memory usage of the actual LTO run? If the issue I was seeing last
Summer is still there, you should see that the invocation of llvm-link is
actually the most memory-intensive part of the LTO step, by far.
Also, you seem to really like saying "peak" here. Is there a definite peak?
When does it occur?
>
> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s
> must all be metadata. The cost per operand is 1 pointer, vs. 4
> pointers in an `MDNode`.
>
> 2. Create `MDLineTable` as the first subclass of `MDUser`. Use normal
> fields (not `Value`s) for the line and column, and use `Use`
> operands for the metadata operands.
>
> On x86-64, this will save 104B / line table entry. Linking
> `llvm-lto` uses ~7M line-table entries, so this on its own saves
> ~700MB.
> Sketch of class definition:
>
> class MDLineTable : public MDUser {
> unsigned Line;
> unsigned Column;
> public:
> static MDLineTable *get(unsigned Line, unsigned Column,
> MDNode *Scope);
> static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope);
> static MDLineTable *getBase(MDLineTable *Inlined);
>
> unsigned getLine() const { return Line; }
> unsigned getColumn() const { return Column; }
> bool isInlined() const { return getNumOperands() == 2; }
> MDNode *getScope() const { return getOperand(0); }
> MDNode *getInlinedAt() const { return getOperand(1); }
> };
>
> Proposed assembly syntax:
>
> ; Not inlined.
> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9)
>
> ; Inlined.
> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9,
> inlinedAt: metadata !10)
>
> ; Column defaulted to 0.
> !7 = metadata !MDLineTable(line: 45, scope: metadata !9)
>
> (What colour should that bike shed be?)
>
> 3. (Optional) Rewrite `DebugLoc` lookup tables. My profiling shows
> that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line
> table entries. The cost of these is ~180B each, for another
> ~600MB.
>
> If we integrate a side-table of `MDLineTable`s into its uniquing,
> the overhead is only ~12B / line table entry, or ~80MB. This saves
> 520MB.
>
> This is somewhat perpendicular to redesigning the metadata format,
> but IMO it's worth doing as soon as it's possible.
>
> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`
> through an intermediate class `DebugMDNode` with an
> allocation-time-optional `CallbackVH` available for referencing
> non-metadata. Change `DIDescriptor` to wrap a `DebugMDNode` instead
> of an `MDNode`.
>
> This saves another ~960MB, for a running total of ~2GB.
>
2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have a
single pie slice near 40% of the # of Value's allocated and another at 21%.
Especially this being "step 4".
As a rough back of the envelope calculation, dividing 15.3GB by ~24 million
Values gives about 600 bytes per Value. That seems sort of excessive (but
is it realistic?). All of the data types that you are proposing to shrink
fall far short of this "average size", meaning that if you are trying to
reduce memory usage, you might be looking in the wrong place. Something
smells fishy. At the very least, this would indicate that the real memory
usage is elsewhere.
A pie chart breaking down the total memory usage seems essential to have
here.
>
> Proposed assembly syntax:
>
> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,
> fields: "0\00clang 3.6\00...",
> operands: { metadata !8, ... })
>
> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
> fields: "global_var\00...",
> operands: { metadata !8, ... },
> handle: i32* @global_var)
>
> This syntax pulls the tag out of the current header-string, calls
> the rest of the header "fields", and includes the metadata operands
> in "operands".
>
> 5. Incrementally create subclasses of `DebugMDNode`, such as
> `MDCompileUnit` and `MDSubprogram`. Sub-classed nodes replace the
> "fields" and "operands" catch-alls with explicit names for each
> operand.
>
> Proposed assembly syntax:
>
> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName:
> "foo",
> linkageName: "_Z3foov", file: metadata
> !8,
> function: i32 (i32)* @foo)
>
> 6. Remove the dead code for `GenericDebugMDNode`.
>
> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
> traffic during bitcode serialization. Now that metadata types are
> known, we can write debug info out in an order that makes it cheap
> to read back in.
>
> Note that using `MDUser` will make RAUW much cheaper, since we're
> using the use-list infrastructure for most of them. If RAUW isn't
> showing up in a profile, I may skip this.
>
> Does this direction seem reasonable? Any major problems I've missed?
>
You need more data. Right now you have essentially one data point, and it's
not even clear what you measured really. If your goal is saving memory, I
would expect at least a pie chart that breaks down LLVM's memory usage (not
just # of allocations of different sorts; an approximation is fine, as long
as you explain how you arrived at it and in what sense it approximates the
true number).
Do the numbers change significantly for different projects? (e.g. Chromium
or Firefox or a kernel or a large app you have handy to compile with LTO?).
If you have specific data you want (and a suggestion for how to gather it),
I can also get your numbers for one of our internal games as well.
Once you have some more data, then as a first step, I would like to see an
analysis of how much we can "ideally" expect to gain (back of the envelope
calculations == win).
-- Sean Silva
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/b1da4b87/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DebugInfoSize.pdf
Type: application/pdf
Size: 108040 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/b1da4b87/attachment.pdf>
More information about the llvm-dev
mailing list