[llvm-dev] [RFC] Lazy-loading of debug info metadata

Tue Mar 22 20:11:25 PDT 2016

On Tue, Mar 22, 2016 at 8:04 PM David Blaikie <dblaikie at gmail.com> wrote:

> +pcc, who had some other ideas/patch out for improving memory usage of
> debug info
> +Reid, who's responsible for the windows/CodeView/PDB debug info which is
> motivating some of the ideas about changes to type emission
>
>
So I discussed this with Adrian and Mehdi at the social last Thursday and
I'm getting set to finish the write up. I think it'll have some bearing on
this proposal as I think it'll change how we want to take a look at the
format of DISubprogram metadata a bit more. That said, most of it is
orthogonal to the changes Duncan is talking about here. Just puts the
pressure on to get the other proposal written up.

-eric

> So how does this relate, or not, to Peter's (pcc) work trying to reduce
> the DIE overhead during code gen? Are you folks chasing different memory
> bottlenecks? Are they both relevant (perhaps in different scenarios)?
>
> Baking into the IR more about types as units has pretty direct overlap
> with Reid/CodeView/etc - so, yeah, that'll takes ome discussion (but, as
> you say, it's not in your immediate plan anyway, so we can come back to
> that - but would be good for whoever gets there first to discuss it with
> the others)
>
> On Tue, Mar 22, 2016 at 7:28 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
>
>> I have some ideas to allow the BitcodeReader to lazy-load debug info
>> metadata, and wanted to air this on llvm-dev before getting too deep
>> into the code.
>>
>> Motivation
>> ==========
>>
>> Based on some analysis Mehdi ran (ping him for details), there are three
>> (related) compile-time bottlenecks we're seeing with `-flto=thin -g`:
>>
>>  a) Reading the large number of Metadata bitcode records in the global
>>     metadata block.  I'm talking about raw `BitStreamer` calls here.
>>
>>  b) Creating unnecessary `DI*` instances (that aren't relevant to code).
>>
>>  c) Emitting unnecessary `DI*` instances (that aren't relevant to code).
>>
>> Here is my recollection of some peak memory stats on a small testcase
>> during thin-LTO, which should be a decent indicator of (b):
>>
>>   - ~150MB: DILocation
>>   - ~100MB: DISubprogram
>>   - ~70MB: DILocalVariable
>>   - ~50MB: (cumulative) DIType descendents
>>
>> It looks, suprisingly, like types are not the primary bottleneck.
>>
>> There are caveats:
>>
>>   - `DISubprogram` declarations -- member function descriptors -- are
>>     part of the type hierarchy.
>>   - Most of the type hierarchy gets uniqued at parse time.
>>   - As a result, these data are a poor indicator for (a).
>>
>> Even so, non-types are substantial.
>>
>> Related work
>> ============
>>
>> Teresa has some post-processing in-place/in-review to avoid importing
>> metadata unnecessarily, but IIUC: it won't address (a) and (b), only
>> (c) (maybe I'm wrong?); and it only helps -flto=thin, not other
>> lazy-loaders.
>>
>> I heard a rumour that Eric has a grand plan to factor away the type
>> hierarchy -- awesome if true -- but I think most of this is worthwhile
>> regardless.
>>
>> Proposal
>> ========
>>
>> Short version
>> -------------
>>
>>  1. Serialize metadata in Function blocks where possible.
>>  2. Reverse the `DISubprogram`/`DICompileUnit` link.
>>  3. Create a `METADATA_SUBPROGRAM_BLOCK`.
>>
>> Type-related work Eric will make unnecessary if he's fast:
>>
>>  4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2).
>>  5. Create a `METADATA_COMPOSITE_TYPE_BLOCK`, similar to (3).
>>
>> Long version
>> ------------
>>
>>  1. If a piece of metadata is referenced from only a single `Function`,
>>     serialize that metadata in the function's metadata block instead of
>>     the global metadata block.
>>
>>     This addresses problems (a) and (b), primarily targeting
>>     `DILocation`s.  It should pick up lots of other stuff, depending on
>>     how much inlining has happened.
>>
>>     (I have a draft of the writer side, still working on the reader.)
>>
>>  2. Reverse the `DISubprogram`/`DICompileUnit` link (David and I have
>>     talked about this in the past in barely-related threads).  The
>>     direct effect is that subprograms that are not pointed at by any
>>     code (!dbg attachments or @llvm.dbg.value intrinsics) get dropped.
>>
>>     This addresses problem (c).  If a consumer is only linking/loading a
>>     subset of a module's functions, this naturally filters subprograms
>>     to the relevant ones.  Also, with limited inlining (and assuming
>>     (1)), it addresses problems (a) and (b), too.
>>
>>     Adrian volunteered to implement this and is apparently almost ready
>>     to post a patch (still working on testcase update script logic I
>>     believe (probably other details, don't let me oversell it)).
>>
>>  3. Create a special `METADATA_SUBPROGRAM_BLOCK` for each `DISubprogram`
>>     in the global metadata block.  Store the relevant `DISubprogram` and
>>     all of the subprogram's `DILexicalBlock`s and `DILocalVariable`s.
>>     The block can be lazy-loaded on an all-or-nothing basis.
>>
>>     In combination with (2), this addresses (a) and (b) in cases that
>>     (1) doesn't catch.  A lazy-loading module will only load the
>>     subprogram blocks that get referenced.
>>
>>     (I have a basic design for this that accounts for references into
>>     the middle of block; I'll see what happens when I flesh it out.)
>>
>> I think this will solve the non-type bottlenecks.
>>
>> If Eric hasn't solved types by then, we can do similar things to the IR
>> for the debug info type hierarchy.
>>
>>  4. Implement my proposal to remove the `DICompositeType` name map from
>>     `retainedTypes:`.
>>
>>
>> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html
>>
>>     Similar to (2) above, this will naturally filter the types that get
>>     linked in to the ones actually used by the code being linked.
>>
>>     It should also allow the reader to skip records for types that have
>>     already been loaded in the main module.
>>
>>  5. Create a special `METADATA_COMPOSITE_TYPE_BLOCK`, similar to (3) but
>>     for composite types and their members.  This avoids the raw bitcode
>>     reading overhead.  (This is totally undesigned at this point.)
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160323/8951a345/attachment.html>