[llvm-dev] [RFC] Lazy-loading of debug info metadata
Duncan P. N. Exon Smith via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 22 19:28:55 PDT 2016
I have some ideas to allow the BitcodeReader to lazy-load debug info
metadata, and wanted to air this on llvm-dev before getting too deep
into the code.
Based on some analysis Mehdi ran (ping him for details), there are three
(related) compile-time bottlenecks we're seeing with `-flto=thin -g`:
a) Reading the large number of Metadata bitcode records in the global
metadata block. I'm talking about raw `BitStreamer` calls here.
b) Creating unnecessary `DI*` instances (that aren't relevant to code).
c) Emitting unnecessary `DI*` instances (that aren't relevant to code).
Here is my recollection of some peak memory stats on a small testcase
during thin-LTO, which should be a decent indicator of (b):
- ~150MB: DILocation
- ~100MB: DISubprogram
- ~70MB: DILocalVariable
- ~50MB: (cumulative) DIType descendents
It looks, suprisingly, like types are not the primary bottleneck.
There are caveats:
- `DISubprogram` declarations -- member function descriptors -- are
part of the type hierarchy.
- Most of the type hierarchy gets uniqued at parse time.
- As a result, these data are a poor indicator for (a).
Even so, non-types are substantial.
Teresa has some post-processing in-place/in-review to avoid importing
metadata unnecessarily, but IIUC: it won't address (a) and (b), only
(c) (maybe I'm wrong?); and it only helps -flto=thin, not other
I heard a rumour that Eric has a grand plan to factor away the type
hierarchy -- awesome if true -- but I think most of this is worthwhile
1. Serialize metadata in Function blocks where possible.
2. Reverse the `DISubprogram`/`DICompileUnit` link.
3. Create a `METADATA_SUBPROGRAM_BLOCK`.
Type-related work Eric will make unnecessary if he's fast:
4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2).
5. Create a `METADATA_COMPOSITE_TYPE_BLOCK`, similar to (3).
1. If a piece of metadata is referenced from only a single `Function`,
serialize that metadata in the function's metadata block instead of
the global metadata block.
This addresses problems (a) and (b), primarily targeting
`DILocation`s. It should pick up lots of other stuff, depending on
how much inlining has happened.
(I have a draft of the writer side, still working on the reader.)
2. Reverse the `DISubprogram`/`DICompileUnit` link (David and I have
talked about this in the past in barely-related threads). The
direct effect is that subprograms that are not pointed at by any
code (!dbg attachments or @llvm.dbg.value intrinsics) get dropped.
This addresses problem (c). If a consumer is only linking/loading a
subset of a module's functions, this naturally filters subprograms
to the relevant ones. Also, with limited inlining (and assuming
(1)), it addresses problems (a) and (b), too.
Adrian volunteered to implement this and is apparently almost ready
to post a patch (still working on testcase update script logic I
believe (probably other details, don't let me oversell it)).
3. Create a special `METADATA_SUBPROGRAM_BLOCK` for each `DISubprogram`
in the global metadata block. Store the relevant `DISubprogram` and
all of the subprogram's `DILexicalBlock`s and `DILocalVariable`s.
The block can be lazy-loaded on an all-or-nothing basis.
In combination with (2), this addresses (a) and (b) in cases that
(1) doesn't catch. A lazy-loading module will only load the
subprogram blocks that get referenced.
(I have a basic design for this that accounts for references into
the middle of block; I'll see what happens when I flesh it out.)
I think this will solve the non-type bottlenecks.
If Eric hasn't solved types by then, we can do similar things to the IR
for the debug info type hierarchy.
4. Implement my proposal to remove the `DICompositeType` name map from
Similar to (2) above, this will naturally filter the types that get
linked in to the ones actually used by the code being linked.
It should also allow the reader to skip records for types that have
already been loaded in the main module.
5. Create a special `METADATA_COMPOSITE_TYPE_BLOCK`, similar to (3) but
for composite types and their members. This avoids the raw bitcode
reading overhead. (This is totally undesigned at this point.)
More information about the llvm-dev