[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

Reid Kleckner rnk at google.com
Mon Oct 13 15:37:08 PDT 2014


I think making debug info more of a first-class IR citizen is probably the
way to go. Right now debug info is completely unreadable and is downright
opposed to the design goals of the IR as I understand them.

Our backwards compatibility policy should give you the flexibility you need
to update the debug info representation as you go along:
http://llvm.org/docs/DeveloperPolicy.html#id18

On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

> In r219010, I merged integer and string fields into a single header
> field.  By reducing the number of metadata operands used in debug info,
> this saved 2.2GB on an `llvm-lto` bootstrap.  I've done some profiling
> of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and
> I've concluded that they will be insufficient.
>
> Instead, I'd like to implement a more aggressive plan, which as a
> side-effect cleans up the much "loved" debug info IR assembly syntax.
>
> At a high-level, the idea is to create distinct subclasses of `Value`
> for each debug info concept, starting with line table entries and moving
> on to the DIDescriptor hierarchy.  By leveraging the use-list
> infrastructure for metadata operands -- i.e., only using value handles
> for non-metadata operands -- we'll improve memory usage and increase
> RAUW speed.
>
> My rough plan follows.  I quote some numbers for memory savings below
> based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto`
> on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's
> -save-temps option) that currently peaks at 15.3GB.
>
>  1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s
>     must all be metadata.  The cost per operand is 1 pointer, vs. 4
>     pointers in an `MDNode`.
>
>  2. Create `MDLineTable` as the first subclass of `MDUser`.  Use normal
>     fields (not `Value`s) for the line and column, and use `Use`
>     operands for the metadata operands.
>
>     On x86-64, this will save 104B / line table entry.  Linking
>     `llvm-lto` uses ~7M line-table entries, so this on its own saves
>     ~700MB.
>
>     Sketch of class definition:
>
>         class MDLineTable : public MDUser {
>           unsigned Line;
>           unsigned Column;
>         public:
>           static MDLineTable *get(unsigned Line, unsigned Column,
>                                   MDNode *Scope);
>           static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope);
>           static MDLineTable *getBase(MDLineTable *Inlined);
>
>           unsigned getLine() const { return Line; }
>           unsigned getColumn() const { return Column; }
>           bool isInlined() const { return getNumOperands() == 2; }
>           MDNode *getScope() const { return getOperand(0); }
>           MDNode *getInlinedAt() const { return getOperand(1); }
>         };
>
>     Proposed assembly syntax:
>
>         ; Not inlined.
>         !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9)
>
>         ; Inlined.
>         !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9,
>                                    inlinedAt: metadata !10)
>
>         ; Column defaulted to 0.
>         !7 = metadata !MDLineTable(line: 45, scope: metadata !9)
>
>     (What colour should that bike shed be?)
>
>  3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling shows
>     that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line
>     table entries.  The cost of these is ~180B each, for another
>     ~600MB.
>
>     If we integrate a side-table of `MDLineTable`s into its uniquing,
>     the overhead is only ~12B / line table entry, or ~80MB.  This saves
>     520MB.
>
>     This is somewhat perpendicular to redesigning the metadata format,
>     but IMO it's worth doing as soon as it's possible.
>
>  4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`
>     through an intermediate class `DebugMDNode` with an
>     allocation-time-optional `CallbackVH` available for referencing
>     non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode` instead
>     of an `MDNode`.
>
>     This saves another ~960MB, for a running total of ~2GB.
>
>     Proposed assembly syntax:
>
>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,
>                                           fields: "0\00clang 3.6\00...",
>                                           operands: { metadata !8, ... })
>
>         !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
>                                           fields: "global_var\00...",
>                                           operands: { metadata !8, ... },
>                                           handle: i32* @global_var)
>
>     This syntax pulls the tag out of the current header-string, calls
>     the rest of the header "fields", and includes the metadata operands
>     in "operands".
>
>  5. Incrementally create subclasses of `DebugMDNode`, such as
>     `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes replace the
>     "fields" and "operands" catch-alls with explicit names for each
>     operand.
>
>     Proposed assembly syntax:
>
>         !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName:
> "foo",
>                                     linkageName: "_Z3foov", file: metadata
> !8,
>                                     function: i32 (i32)* @foo)
>
>  6. Remove the dead code for `GenericDebugMDNode`.
>
>  7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
>     traffic during bitcode serialization.  Now that metadata types are
>     known, we can write debug info out in an order that makes it cheap
>     to read back in.
>
>     Note that using `MDUser` will make RAUW much cheaper, since we're
>     using the use-list infrastructure for most of them.  If RAUW isn't
>     showing up in a profile, I may skip this.
>
> Does this direction seem reasonable?  Any major problems I've missed?
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/426293ca/attachment.html>


More information about the llvm-dev mailing list