[llvm-dev] [RFC] Lazy-loading of debug info metadata

Teresa Johnson via llvm-dev llvm-dev at lists.llvm.org
Wed Mar 23 13:07:46 PDT 2016


On Wed, Mar 23, 2016 at 11:10 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2016-Mar-23, at 09:42, Teresa Johnson <tejohnson at google.com> wrote:
> >
> >
> >
> > On Wed, Mar 23, 2016 at 9:31 AM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> >
> > > On 2016-Mar-23, at 07:17, Teresa Johnson <tejohnson at google.com> wrote:
> > >
> > >
> > >
> > > On Tue, Mar 22, 2016 at 7:28 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > > I have some ideas to allow the BitcodeReader to lazy-load debug info
> > > metadata, and wanted to air this on llvm-dev before getting too deep
> > > into the code.
> > >
> > > Motivation
> > > ==========
> > >
> > > Based on some analysis Mehdi ran (ping him for details), there are
> three
> > > (related) compile-time bottlenecks we're seeing with `-flto=thin -g`:
> > >
> > >  a) Reading the large number of Metadata bitcode records in the global
> > >     metadata block.  I'm talking about raw `BitStreamer` calls here.
> > >
> > >  b) Creating unnecessary `DI*` instances (that aren't relevant to
> code).
> > >
> > > Creating in the source module, or in the dest module during linking?
> > >
> > >
> > >  c) Emitting unnecessary `DI*` instances (that aren't relevant to
> code).
> > >
> > > Here is my recollection of some peak memory stats on a small testcase
> > > during thin-LTO, which should be a decent indicator of (b):
> > >
> > >   - ~150MB: DILocation
> > >   - ~100MB: DISubprogram
> > >   - ~70MB: DILocalVariable
> > >   - ~50MB: (cumulative) DIType descendents
> > >
> > > It looks, suprisingly, like types are not the primary bottleneck.
> > >
> > > There are caveats:
> > >
> > >   - `DISubprogram` declarations -- member function descriptors -- are
> > >     part of the type hierarchy.
> > >   - Most of the type hierarchy gets uniqued at parse time.
> > >   - As a result, these data are a poor indicator for (a).
> > >
> > > Even so, non-types are substantial.
> > >
> > > Related work
> > > ============
> > >
> > > Teresa has some post-processing in-place/in-review to avoid importing
> > > metadata unnecessarily, but IIUC: it won't address (a) and (b), only
> > > (c) (maybe I'm wrong?); and it only helps -flto=thin, not other
> > > lazy-loaders.
> > >
> > > That is D16440. It reduces the metadata imported into the dest module
> (not sure whether that falls into (b) or just (c)).
> > >
> > > It could actually help full LTO as well since I also added support for
> not linking in unneeded DISubprogram for full LTO at the same time as
> ThinLTO in r256003. But right now the changes in the patch are guarded so
> they only happen under ThinLTO since some of the other things we prune from
> the imported DICompileUnit only applies to ThinLTO. I could restructure
> that a bit to get the reduced retained types importing to occur for full
> LTO as well.
> > >
> > >
> > > I heard a rumour that Eric has a grand plan to factor away the type
> > > hierarchy -- awesome if true -- but I think most of this is worthwhile
> > > regardless.
> > >
> > > Proposal
> > > ========
> > >
> > > Short version
> > > -------------
> > >
> > >  1. Serialize metadata in Function blocks where possible.
> > >  2. Reverse the `DISubprogram`/`DICompileUnit` link.
> > >  3. Create a `METADATA_SUBPROGRAM_BLOCK`.
> > >
> > > Type-related work Eric will make unnecessary if he's fast:
> > >
> > >  4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2).
> > >  5. Create a `METADATA_COMPOSITE_TYPE_BLOCK`, similar to (3).
> > >
> > > Long version
> > > ------------
> > >
> > >  1. If a piece of metadata is referenced from only a single `Function`,
> > >     serialize that metadata in the function's metadata block instead of
> > >     the global metadata block.
> > >
> > >     This addresses problems (a) and (b), primarily targeting
> > >     `DILocation`s.  It should pick up lots of other stuff, depending on
> > >     how much inlining has happened.
> > >
> > >     (I have a draft of the writer side, still working on the reader.)
> > >
> > >  2. Reverse the `DISubprogram`/`DICompileUnit` link (David and I have
> > >     talked about this in the past in barely-related threads).  The
> > >     direct effect is that subprograms that are not pointed at by any
> > >     code (!dbg attachments or @llvm.dbg.value intrinsics) get dropped.
> > >
> > >     This addresses problem (c).  If a consumer is only linking/loading
> a
> > >     subset of a module's functions, this naturally filters subprograms
> > >     to the relevant ones.  Also, with limited inlining (and assuming
> > >     (1)), it addresses problems (a) and (b), too.
> > >
> > >     Adrian volunteered to implement this and is apparently almost ready
> > >     to post a patch (still working on testcase update script logic I
> > >     believe (probably other details, don't let me oversell it)).
> > >
> > > As noted in the review thread for my D16440, I'll need to adjust that
> handling once this link reversal goes in.
> > >
> > >
> > >  3. Create a special `METADATA_SUBPROGRAM_BLOCK` for each
> `DISubprogram`
> > >     in the global metadata block.  Store the relevant `DISubprogram`
> and
> > >     all of the subprogram's `DILexicalBlock`s and `DILocalVariable`s.
> > >     The block can be lazy-loaded on an all-or-nothing basis.
> > >
> > >     In combination with (2), this addresses (a) and (b) in cases that
> > >     (1) doesn't catch.  A lazy-loading module will only load the
> > >     subprogram blocks that get referenced.
> > >
> > > I'm not sure I understand this part - if the debug info for each
> subprogram can be divided into separate blocks, why can't it be moved into
> the function's metadata block? I.e. what happens for debug metadata that is
> referenced by multiple functions, which I thought was all that was going to
> remain in the global metadata block? Oh - the DISubprogram may be
> referenced in other places within the global metadata so cannot move into
> the function metadata block. So debug metadata only reached from that
> DISubprogram is included in its block, but any debug metadata referenced by
> multiple DISubprograms would not be located within one of these blocks?
> >
> > The problem is that the same subprogram may be referenced from multiple
> > functions.  Consider:
> > --
> > void sink();
> > __attribute__((always_inline)) static inline void foo() { sink(); }
> > void caller1() { foo(); }
> > void caller2() { foo(); }
> > void unrelated() {}
> > --
> >
> > The IR for caller1 and caller2 will both reference the subprogram for
> > foo, something like:
> > --
> > define void @caller1() !dbg !2 {
> >   call void @sink(), !dbg !5
> >   ret void, !dbg !7
> > }
> > define void @caller1() !dbg !3 {
> >   call void @sink(), !dbg !8
> >   ret void, !dbg !10
> > }
> > define void @unrelated() !dbg !4 {
> >   ret void, !dbg !11
> > }
> >
> > !1 = distinct !DISubprogram(name: "foo")
> > !2 = distinct !DISubprogram(name: "caller1")
> > !3 = distinct !DISubprogram(name: "caller2")
> > !4 = distinct !DISubprogram(name: "unrelated")
> >
> > !5 = !DILocation(line: 2, scope: !1, inlinedAt: !6)
> > !6 = distinct !DILocation(line: 3, scope: !2)
> > !7 = !DILocation(line: 3, scope: !2)
> >
> > !8 = !DILocation(line: 2, scope: !1, inlinedAt: !9)
> > !9 = distinct !DILocation(line: 4, scope: !3)
> > !10 = !DILocation(line: 4, scope: !3)
> >
> > !11 = !DILocation(line: 5, scope: !4)
> > --
> > In this example, !1 (subprogram for foo) is referenced from both
> > @caller1 and @caller2, so we need some common place to save it.
> >
> > We still want to lazy-load it (and the local variables and lexical
> > blocks underneath it), so that if the loader only wants @unrelated
> > !1 (and its vars/blocks) never gets read.
> >
> > Right, that's what I realized halfway through writing the above
> paragraph. But my remaining question (the last sentence) is what happens to
> other debug metadata nodes that are referenced perhaps indirectly by
> multiple DISubprograms? Some would be composite types, I guess they would
> be placed in the METADATA_COMPOSITE_TYPE_BLOCK you mention below. But I
> believe there are going to be some other uniqued nodes that are not
> composite types and are referenced via multiple DISubprogram or other
> composite types?
>
> It looks roughly like most things are either: (i) DILocations;
> (ii) DISubprogram (definitions), DILexicalBocks, and
> DILocalVariables; or (iii) DICompositeType, DISubprogram
> declarations, and other parts of DICompositeType.  I was thinking
> we'd see what was left at that point, and then if necessary, find a
> way to structure the rest.
>
> It's possible we'll want to emit some uniqued nodes multiple times,
> sacrificing bitcode size for lazy-loading performance.  We already
> do this for constants in the IR: if a constant is not referenced
> globally, it's emitted in each Function that references it.  We
> could do the same with, e.g., DISubroutineType.
>

Thanks, that makes sense.

Teresa


>
> > Thanks,
> > Teresa
> >
> >
> > >
> > >
> > >     (I have a basic design for this that accounts for references into
> > >     the middle of block; I'll see what happens when I flesh it out.)
> > >
> > > I think this will solve the non-type bottlenecks.
> > >
> > > If Eric hasn't solved types by then, we can do similar things to the IR
> > > for the debug info type hierarchy.
> > >
> > >  4. Implement my proposal to remove the `DICompositeType` name map from
> > >     `retainedTypes:`.
> > >
> > >
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html
> > >
> > >     Similar to (2) above, this will naturally filter the types that get
> > >     linked in to the ones actually used by the code being linked.
> > >
> > >     It should also allow the reader to skip records for types that have
> > >     already been loaded in the main module.
> > >
> > > The ValueMapper or something will need to figure out which types
> referenced by UUID to map/link in to the dest module. Currently the
> ValueMapper does not follow UUID references, but these are brought in when
> the DICompileUnit is mapped since they are all in the retained types list.
> > >
> > >
> > >  5. Create a special `METADATA_COMPOSITE_TYPE_BLOCK`, similar to (3)
> but
> > >     for composite types and their members.  This avoids the raw bitcode
> > >     reading overhead.  (This is totally undesigned at this point.)
> > >
> > > Ditto here - any metadata referenced by multiple composite types does
> not go into a block, right?
> > >
> > > Thanks,
> > > Teresa
> > >
> > >
> > > --
> > > Teresa Johnson |       Software Engineer |     tejohnson at google.com
> |  408-460-2413
> >
> >
> >
> >
> > --
> > Teresa Johnson |       Software Engineer |     tejohnson at google.com |
> 408-460-2413
>
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160323/cc0c15b9/attachment.html>


More information about the llvm-dev mailing list