[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
Eric Christopher
echristo at gmail.com
Wed Oct 15 14:31:28 PDT 2014
On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
> On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at gmail.com>
> wrote:
>>
>> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at gmail.com> wrote:
>> > For those interested, I've attached some pie charts based on Duncan's
>> > data
>> > in one of the other posts; successive slides break down the usage
>> > increasingly finely. To my understanding, they represent the number of
>> > Value's (and subclasses) allocated.
>> >
>> > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith
>> > <dexonsmith at apple.com> wrote:
>> >>
>> >> In r219010, I merged integer and string fields into a single header
>> >> field. By reducing the number of metadata operands used in debug info,
>> >> this saved 2.2GB on an `llvm-lto` bootstrap. I've done some profiling
>> >> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and
>> >> I've concluded that they will be insufficient.
>> >>
>> >> Instead, I'd like to implement a more aggressive plan, which as a
>> >> side-effect cleans up the much "loved" debug info IR assembly syntax.
>> >>
>> >> At a high-level, the idea is to create distinct subclasses of `Value`
>> >> for each debug info concept, starting with line table entries and
>> >> moving
>> >> on to the DIDescriptor hierarchy. By leveraging the use-list
>> >> infrastructure for metadata operands -- i.e., only using value handles
>> >> for non-metadata operands -- we'll improve memory usage and increase
>> >> RAUW speed.
>> >>
>> >> My rough plan follows. I quote some numbers for memory savings below
>> >> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto`
>> >> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's
>> >> -save-temps option) that currently peaks at 15.3GB.
>> >
>> >
>> > Stupid question, but when I was working on LTO last Summer the primary
>> > culprit for excessive memory use was due to us not being smart when
>> > linking
>> > the IR together (Espindola would know more details). Do we still have
>> > that
>> > problem? For starters, how does the memory usage of just llvm-link
>> > compare
>> > to the memory usage of the actual LTO run? If the issue I was seeing
>> > last
>> > Summer is still there, you should see that the invocation of llvm-link
>> > is
>> > actually the most memory-intensive part of the LTO step, by far.
>> >
>>
>> This is vague. Could you be more specific on where you saw all of the
>> memory?
>
>
> Running `llvm-link *.bc` would OOM a machine with 64GB of RAM (with -g;
> without -g it completed with much less). The increasing could be easily
> watched on the system "process monitor" in real time.
>
This is likely what we've already discussed and was handled a long
while ago now.
-eric
> -- Sean Silva
>
>>
>>
>> -eric
>>
>> >
>> > Also, you seem to really like saying "peak" here. Is there a definite
>> > peak?
>> > When does it occur?
>> >
>> >
>> >>
>> >>
>> >> 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s
>> >> must all be metadata. The cost per operand is 1 pointer, vs. 4
>> >> pointers in an `MDNode`.
>> >>
>> >> 2. Create `MDLineTable` as the first subclass of `MDUser`. Use normal
>> >> fields (not `Value`s) for the line and column, and use `Use`
>> >> operands for the metadata operands.
>> >>
>> >> On x86-64, this will save 104B / line table entry. Linking
>> >> `llvm-lto` uses ~7M line-table entries, so this on its own saves
>> >> ~700MB.
>> >>
>> >>
>> >> Sketch of class definition:
>> >>
>> >> class MDLineTable : public MDUser {
>> >> unsigned Line;
>> >> unsigned Column;
>> >> public:
>> >> static MDLineTable *get(unsigned Line, unsigned Column,
>> >> MDNode *Scope);
>> >> static MDLineTable *getInlined(MDLineTable *Base, MDNode
>> >> *Scope);
>> >> static MDLineTable *getBase(MDLineTable *Inlined);
>> >>
>> >> unsigned getLine() const { return Line; }
>> >> unsigned getColumn() const { return Column; }
>> >> bool isInlined() const { return getNumOperands() == 2; }
>> >> MDNode *getScope() const { return getOperand(0); }
>> >> MDNode *getInlinedAt() const { return getOperand(1); }
>> >> };
>> >>
>> >> Proposed assembly syntax:
>> >>
>> >> ; Not inlined.
>> >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata
>> >> !9)
>> >>
>> >> ; Inlined.
>> >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata
>> >> !9,
>> >> inlinedAt: metadata !10)
>> >>
>> >> ; Column defaulted to 0.
>> >> !7 = metadata !MDLineTable(line: 45, scope: metadata !9)
>> >>
>> >> (What colour should that bike shed be?)
>> >>
>> >> 3. (Optional) Rewrite `DebugLoc` lookup tables. My profiling shows
>> >> that we have 3.5M entries in the `DebugLoc` side-vectors for 7M
>> >> line
>> >> table entries. The cost of these is ~180B each, for another
>> >> ~600MB.
>> >>
>> >> If we integrate a side-table of `MDLineTable`s into its uniquing,
>> >> the overhead is only ~12B / line table entry, or ~80MB. This saves
>> >> 520MB.
>> >>
>> >> This is somewhat perpendicular to redesigning the metadata format,
>> >> but IMO it's worth doing as soon as it's possible.
>> >>
>> >> 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`
>> >> through an intermediate class `DebugMDNode` with an
>> >> allocation-time-optional `CallbackVH` available for referencing
>> >> non-metadata. Change `DIDescriptor` to wrap a `DebugMDNode`
>> >> instead
>> >> of an `MDNode`.
>> >>
>> >> This saves another ~960MB, for a running total of ~2GB.
>> >
>> >
>> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have
>> > a
>> > single pie slice near 40% of the # of Value's allocated and another at
>> > 21%.
>> > Especially this being "step 4".
>> >
>> > As a rough back of the envelope calculation, dividing 15.3GB by ~24
>> > million
>> > Values gives about 600 bytes per Value. That seems sort of excessive
>> > (but is
>> > it realistic?). All of the data types that you are proposing to shrink
>> > fall
>> > far short of this "average size", meaning that if you are trying to
>> > reduce
>> > memory usage, you might be looking in the wrong place. Something smells
>> > fishy. At the very least, this would indicate that the real memory usage
>> > is
>> > elsewhere.
>> >
>> > A pie chart breaking down the total memory usage seems essential to have
>> > here.
>> >
>> >>
>> >>
>> >> Proposed assembly syntax:
>> >>
>> >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,
>> >> fields: "0\00clang
>> >> 3.6\00...",
>> >> operands: { metadata !8, ...
>> >> })
>> >>
>> >> !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
>> >> fields: "global_var\00...",
>> >> operands: { metadata !8, ...
>> >> },
>> >> handle: i32* @global_var)
>> >>
>> >> This syntax pulls the tag out of the current header-string, calls
>> >> the rest of the header "fields", and includes the metadata operands
>> >> in "operands".
>> >>
>> >> 5. Incrementally create subclasses of `DebugMDNode`, such as
>> >> `MDCompileUnit` and `MDSubprogram`. Sub-classed nodes replace the
>> >> "fields" and "operands" catch-alls with explicit names for each
>> >> operand.
>> >>
>> >> Proposed assembly syntax:
>> >>
>> >> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName:
>> >> "foo",
>> >> linkageName: "_Z3foov", file:
>> >> metadata
>> >> !8,
>> >> function: i32 (i32)* @foo)
>> >>
>> >> 6. Remove the dead code for `GenericDebugMDNode`.
>> >>
>> >> 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
>> >> traffic during bitcode serialization. Now that metadata types are
>> >> known, we can write debug info out in an order that makes it cheap
>> >> to read back in.
>> >>
>> >> Note that using `MDUser` will make RAUW much cheaper, since we're
>> >> using the use-list infrastructure for most of them. If RAUW isn't
>> >> showing up in a profile, I may skip this.
>> >>
>> >> Does this direction seem reasonable? Any major problems I've missed?
>> >
>> >
>> > You need more data. Right now you have essentially one data point, and
>> > it's
>> > not even clear what you measured really. If your goal is saving memory,
>> > I
>> > would expect at least a pie chart that breaks down LLVM's memory usage
>> > (not
>> > just # of allocations of different sorts; an approximation is fine, as
>> > long
>> > as you explain how you arrived at it and in what sense it approximates
>> > the
>> > true number).
>> >
>> > Do the numbers change significantly for different projects? (e.g.
>> > Chromium
>> > or Firefox or a kernel or a large app you have handy to compile with
>> > LTO?).
>> > If you have specific data you want (and a suggestion for how to gather
>> > it),
>> > I can also get your numbers for one of our internal games as well.
>> >
>> > Once you have some more data, then as a first step, I would like to see
>> > an
>> > analysis of how much we can "ideally" expect to gain (back of the
>> > envelope
>> > calculations == win).
>> >
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>
>
More information about the llvm-dev
mailing list