[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

Mon Oct 13 15:02:56 PDT 2014

In r219010, I merged integer and string fields into a single header
field.  By reducing the number of metadata operands used in debug info,
this saved 2.2GB on an `llvm-lto` bootstrap.  I've done some profiling
of DW_TAGs to see what parts of PR17891 and PR17892 to tackle next, and
I've concluded that they will be insufficient.

Instead, I'd like to implement a more aggressive plan, which as a
side-effect cleans up the much "loved" debug info IR assembly syntax.

At a high-level, the idea is to create distinct subclasses of `Value`
for each debug info concept, starting with line table entries and moving
on to the DIDescriptor hierarchy.  By leveraging the use-list
infrastructure for metadata operands -- i.e., only using value handles
for non-metadata operands -- we'll improve memory usage and increase
RAUW speed.

My rough plan follows.  I quote some numbers for memory savings below
based on an -flto -g bootstrap of `llvm-lto` (i.e., running `llvm-lto`
on `llvm-lto.lto.bc`, an already-linked bitcode file dumped by ld64's
-save-temps option) that currently peaks at 15.3GB.

 1. Introduce `MDUser`, which inherits from `User`, and whose `Use`s
    must all be metadata.  The cost per operand is 1 pointer, vs. 4
    pointers in an `MDNode`.

 2. Create `MDLineTable` as the first subclass of `MDUser`.  Use normal
    fields (not `Value`s) for the line and column, and use `Use`
    operands for the metadata operands.

    On x86-64, this will save 104B / line table entry.  Linking
    `llvm-lto` uses ~7M line-table entries, so this on its own saves
    ~700MB.

    Sketch of class definition:

        class MDLineTable : public MDUser {
          unsigned Line;
          unsigned Column;
        public:
          static MDLineTable *get(unsigned Line, unsigned Column,
                                  MDNode *Scope);
          static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope);
          static MDLineTable *getBase(MDLineTable *Inlined);

          unsigned getLine() const { return Line; }
          unsigned getColumn() const { return Column; }
          bool isInlined() const { return getNumOperands() == 2; }
          MDNode *getScope() const { return getOperand(0); }
          MDNode *getInlinedAt() const { return getOperand(1); }
        };

    Proposed assembly syntax:

        ; Not inlined.
        !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9)

        ; Inlined.
        !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9,
                                   inlinedAt: metadata !10)

        ; Column defaulted to 0.
        !7 = metadata !MDLineTable(line: 45, scope: metadata !9)

    (What colour should that bike shed be?)

 3. (Optional) Rewrite `DebugLoc` lookup tables.  My profiling shows
    that we have 3.5M entries in the `DebugLoc` side-vectors for 7M line
    table entries.  The cost of these is ~180B each, for another
    ~600MB.

    If we integrate a side-table of `MDLineTable`s into its uniquing,
    the overhead is only ~12B / line table entry, or ~80MB.  This saves
    520MB.

    This is somewhat perpendicular to redesigning the metadata format,
    but IMO it's worth doing as soon as it's possible.

 4. Create `GenericDebugMDNode`, a transitional subclass of `MDUser`
    through an intermediate class `DebugMDNode` with an
    allocation-time-optional `CallbackVH` available for referencing
    non-metadata.  Change `DIDescriptor` to wrap a `DebugMDNode` instead
    of an `MDNode`.

    This saves another ~960MB, for a running total of ~2GB.

    Proposed assembly syntax:

        !7 = metadata !GenericDebugMDNode(tag: DW_TAG_compile_unit,
                                          fields: "0\00clang 3.6\00...",
                                          operands: { metadata !8, ... })

        !7 = metadata !GenericDebugMDNode(tag: DW_TAG_variable,
                                          fields: "global_var\00...",
                                          operands: { metadata !8, ... },
                                          handle: i32* @global_var)

    This syntax pulls the tag out of the current header-string, calls
    the rest of the header "fields", and includes the metadata operands
    in "operands".

 5. Incrementally create subclasses of `DebugMDNode`, such as
    `MDCompileUnit` and `MDSubprogram`.  Sub-classed nodes replace the
    "fields" and "operands" catch-alls with explicit names for each
    operand.

    Proposed assembly syntax:

        !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: "foo",
                                    linkageName: "_Z3foov", file: metadata !8,
                                    function: i32 (i32)* @foo)

 6. Remove the dead code for `GenericDebugMDNode`.

 7. (Optional) Refactor `DebugMDNode` sub-classes to minimize RAUW
    traffic during bitcode serialization.  Now that metadata types are
    known, we can write debug info out in an order that makes it cheap
    to read back in.

    Note that using `MDUser` will make RAUW much cheaper, since we're
    using the use-list infrastructure for most of them.  If RAUW isn't
    showing up in a profile, I may skip this.

Does this direction seem reasonable?  Any major problems I've missed?