[llvm-dev] [RFC] Lazy-loading of debug info metadata

Mon Apr 18 08:08:58 PDT 2016

> 
>> On 2016-Apr-15, at 17:12, David Blaikie <dblaikie at gmail.com> wrote:
>> 
>> 
>> 
>> On Fri, Apr 15, 2016 at 4:04 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> 
>> > On 2016-Apr-15, at 14:53, David Blaikie <dblaikie at gmail.com> wrote:
>> >
>> >
>> >
>> >> On Fri, Apr 15, 2016 at 2:27 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> >>
>> >> > On 2016-Apr-15, at 10:27, David Blaikie <dblaikie at gmail.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Apr 15, 2016 at 9:50 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> >> > Since I haven't heard any objections to the direction, I'm planning to
>> >> > commit this (step 4) when I find some cycles; likely over the weekend.
>> >> >
>> >> > To make this more concrete (in case someone is silently concerned) I've
>> >> > posted my WIP patches below.  They apply cleanly to r266414.  There are
>> >> > a few mechanical changes missing that are tracked in the commit
>> >> > messages (such as LangRef, or new tests in some cases).  To get
>> >> > `ninja check` to pass after the final commit you'll need to run the
>> >> > attached upgrade script.  The clang changes are truly uninteresting so
>> >> > I've left them out.
>> >> >
>> >> >   - 0001/2: Prep commits I need to flush out (sorry for the noise).
>> >> >   - 0003: Add an explicit type map for ODR type uniquing.
>> >> >   - 0004: Prep.
>> >> >   - 0005: Add ODR type uniquing of members.
>> >> >

FYI, I hit some major problems with all of this.

The short version is that I was testing -flto memory usage with the wrong
command-line options (-gline-tables-only instead of -g), and now that
I've done it correctly I've discovered a major memory blowup.  I'm
still investigating.

I also have a question at the end about a semantic problem I uncovered.

--

The attached patches were supposed to finish this off -- they've been
mostly ready since Saturday -- but I had a big surprise when I finally
configured correctly (apparently the Apple CMake caches default to
line-tables only for RelWithDebInfo): 30GB peak for linking clang.

Then I went back and ran a fresh bootstrap of ToT, and got 27GB.  This is
pretty terrible (we were down to around 17GB back in October).  I'd been
doing spot checks of memory usage (glancing at top) and it was
surprisingly low; I just assumed someone had made improvements when I
wasn't looking.  Shame on me :(.

(I do I have a bot setup that is supposed to catch major regressions like
this (it doesn't exactly track memory usage directly, but it does a half-
bootstrap of -flto=full -g).  Except at some point it switched to using
a CMake cache, and it has been using -gline-tables-only as well:
  http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/
  http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/8599/consoleFull
)

I hope the blowup is from the work I've done over the last couple of
weeks; I'll continue investigating today and revert whatever I have to.
Sorry for the noise and the breakage.  (It's also possible this is not a
regression from my recent work, but that would be harder to recover from;
let's hope it was from me.)

--

I'm also concerned about the semantics of a particular case that I
learned debugging correctness issues.  If a type is defined inside a
function, we create a DICompositeType (CT) whose "scope:" is the
DISubprogram (SP) for that function's definition.  This SP is not part of
the type graph, it's the actual definition that eventually describes the
code.

However, the CT has an ODR identifier, which means it will get uniqued
by the DITypeMap.  This seems wrong.  If the function is defined in a
header you can end up with two copies of this type, and their scope is
completely unrelated.  Here it is without using the type map:
--
!0 = distinct !DICompileUnit()
!1 = distinct !DISubprogram(name: "foo", unit: !0)
!2 = distinct !DICompositeType(name: "T", identifier: "foo_T", scope: !1)
!3 = distinct !DICompileUnit()
!4 = distinct !DISubprogram(name: "foo", unit: !3)
!5 = distinct !DICompositeType(name: "T", identifier: "foo_T", scope: !4)
--

I had thought that when !DICompositeType had an 'identifier:' (followed
ODR-rules) then its 'scope:' must also have an 'identifier:', or at least
follow ODR-rules.  That's clearly not always the case right now.  There
are two simple ways to make it so:

  1. Drop the 'identifier:' from composite types local to a subprogram.

  2. Change the 'scope:' to point at a *declaration* of the subprogram
     (which lives in the type graph) and unique them there (and disallow
     having 'identifier:' types point at subprogram definitions).

I'd appreciate feedback on whether either of these is a good idea.

(Another way of looking at this, is that I'd imagined there were two
parts of the debug info graph: the "types" part, and the "codegen" part
(by which I mean stuff that describes the actual emitted object code);
and I'd imagined that the "types" part never referenced the "codegen"
part.  However it does in this case.)

--

(the patches that increased memory from 27GB to 30GB)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-IR-Remove-DITypeRef-DIScopeRef-and-DINodeRef-llvm.patch
Type: application/octet-stream
Size: 322088 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160418/bef754e6/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0007-WIP-DebugInfo-Adapt-to-loss-of-DITypeRef-in-LL-clang.patch
Type: application/octet-stream
Size: 67049 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160418/bef754e6/attachment-0003.obj>
-------------- next part --------------

>> >> > What's this ^ for?
>> >> >
>> >> > In theory, the list of /members/ of a type shouldn't vary between instances. The special cases (implicit special members, member function template specializations, and nested types) don't appear in the member list - instead they just list the type as their scope (so it's one directional, instead of bidirectional)
>> >> >
>> >> > Is this to handle ODR violations? We currently don't handle them, right - we just pick one instance of the type & the user gets to keep the pieces. Is there a particular motivation for changing that? Some place where it's more problematic, we have to support it, etc?
>> >>
>> >> This is a strange one; I probably should have called it out actually.
>> >> It's a bugfix vs. ToT; and without it, 0006 regresses DWARF output.
>> >>
>> >> File and line can change without violating ODR, which means that we
>> >> end up with duplicate members.  I think I may have written something
>> >> up in the WIP commit message for 0005, but also have a look in tree at
>> >> test/Linker/type-unique-odr-a.ll.
>> >>
>> >> It's legal, e.g., to have the following in two separate files, as long
>> >> as the sequence of tokens is identical:
>> >> --
>> >> struct S { void foo(); };
>> >> --
>> >> However, the DISubprogram(isDefinition: false) for S::foo will not be
>> >> structurally equivalent (since the 'file:' node won't match), so we
>> >> currently end up with two metadata nodes.  The same thing happens for
>> >> the DICompositeType for `S` (and also applies to any fields).
>> >>
>> >> The checked-in testcase confirms that the description of `S` only has
>> >> one copy of `S::foo` in the DWARF output.  (The testcase doesn't provide
>> >> good enough coverage actually; it gets lucky because the definition of
>> >> S::foo is in the second file being linked (type-unique-odr-b.ll).  If
>> >> you reverse the order of the files, then `S` will have two copies of
>> >> `S::foo`.)
>> >>
>> >> 0005 solves the problem up front (and in both directions; I'll add a RUN
>> >> line to test/Linker/type-unique-odr-a.ll).
>> >>
>> >> I came across this because the testcase started failing with 0006.  The
>> >> current (but not-quite-sufficient) logic relies on the resolve() logic
>> >> in DwarfDebug.cpp, which indirects through the merged 'retainedTypes:'
>> >> map (the asymmetry in building the merged map explains why the in-tree
>> >> logic only works for one linking direction).
>> >
>> > Not sure I'm quite following.
>> >
>> > I think I understand the circumstance you're describing (indeed, the type itself can be defined on different lines/files without violating the ODR, or its members, or, etc... ). But if the subprogram declaration is in a different file in one definition - that'd make the composite type itself different (metadata uniquing - composite type references the subprogram, etc).
>> >
>> > But we should only every find one version of the type (all references to the type should be via string-based dityperefs). Sure, we'll pick one at random, but when/how would we get two?
>> 
>> We get two because one of the modules contributes a definition of S::foo,
>> and it may be pointing at the declaration from a version of S that isn't
>> blessed.  If you reverse the RUN line in the in-tree testcase you can see
>> what happens, but here's a quick hand-written version:
>> --
>> !llvm.dbg.cus = !{!0, !4}
>> 
>> !0 = distinct !DICompileUnit(file: !1, retainedTypes: !{!2})
>> !1 = !DIFile(filename: "a.cpp")
>> !2 = !DICompositeType(file: !1, name: "S", elements: !{!3},
>>       identifier: "_ZTS1S")
>> !3 = !DISubprogram(isDefinition: false, file: !1, name: "foo",
>>       scope: !2, linkageName: "_ZN1S3fooEv")
>> 
>> !4 = distinct !DICompileUnit(file: !5, retainedTypes: !{!6})
>> !5 = !DIFile(filename: "b.cpp")
>> !6 = !DICompositeType(file: !5, name: "S", elements: !{!7},
>>       identifier: "_ZTS1S")
>> !7 = !DISubprogram(isDefinition: false, file: !5, name: "foo",
>>       scope: !6, linkageName: "_ZN1S3fooEv")
>> 
>> define void @foo() !dbg !8 {
>>   ret void
>> }
>> 
>> !8 = !DISubprogram(isDefinition: true, file: !5, name: "foo",
>>       unit: !4, linkageName: "_Z3foov", declaration: !7)
>> --
>> 
>> If DwarfDebug picks !6 for the canonical "_ZTS1S", then there's no
>> problem.  !6 points at its function !7, and the definition !8 also
>> points at !7.
>> 
>> On the other hand, if DwarfDebug picks !2 for "_ZTS1S", then !2 will
>> pull in !3 and !8 will still pull in !7.  The llvm-dwarfdebug output
>> will emit both !3 and !7 inside of !2.
> 
> Ah, thanks for explaining it, I understand now. The problem is that the definition reaches into the type blob via the declaration with a direct reference.
> 
> We're going to want to abstract that reference away at some point to cross the same boundary as for type references, I think - I imagine that's what your patch/plan is to do?

(I missed this question before.  I didn't exactly abstract it, just
canonicalized it.  I think we had a discussion when I committed it in
r266548.)

> As I think Reid chimed in here or elsewhere, he's going to need something like that to cross the boundary into the opaque type blob for CodeView output too, I imagine.
> 
>> >
>> > >
>> > >   - 0006: Remove DITypeRef (string-based references) and strip down
>> > >     'retainedTypes:'.
>> > >
>> > > Once this is done, I expect the bitcode block layout improvements for
>> > > lazy-loading of subprograms and composite types (steps 3 and 5 from
>> > > this RFC) to be fairly straightforward.
>> > >
>> > >
>> > >
>> > >
>> > > > On 2016-Mar-29, at 19:11, Eric Christopher <echristo at gmail.com> wrote:
>> > > >
>> > > > I have no objections to any of this FWIW :)
>> > > >
>> > > > -eric
>> > > >
>> > > > On Tue, Mar 29, 2016 at 6:46 PM Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> > > >
>> > > > > On 2016-Mar-22, at 20:11, Eric Christopher <echristo at gmail.com> wrote:
>> > > > >
>> > > > >> On Tue, Mar 22, 2016 at 8:04 PM David Blaikie <dblaikie at gmail.com> wrote:
>> > > > >> +pcc, who had some other ideas/patch out for improving memory usage of debug info
>> > > > >> +Reid, who's responsible for the windows/CodeView/PDB debug info which is motivating some of the ideas about changes to type emission
>> > > > >
>> > > > > So I discussed this with Adrian and Mehdi at the social last Thursday and I'm getting set to finish the write up. I think it'll have some bearing on this proposal as I think it'll change how we want to take a look at the format of DISubprogram metadata a bit more.
>> > > >
>> > > > (The interesting bit here is to make a clearer split between
>> > > > DISubprogram declarations (part of the type hierarchY) and
>> > > > DISubprogram definitions (part of the code/line table/variable
>> > > > locations).  I think that'll end up being mostly orthogonal to what
>> > > > I'm trying to do.)
>> > > >
>> > > > > That said, most of it is orthogonal to the changes Duncan is talking about here. Just puts the pressure on to get the other proposal written up.
>> > > >
>> > > > Which is now here:
>> > > > http://lists.llvm.org/pipermail/llvm-dev/2016-March/097773.html
>> > > >
>> > > > >> Baking into the IR more about types as units has pretty direct overlap with Reid/CodeView/etc - so, yeah, that'll takes ome discussion (but, as you say, it's not in your immediate plan anyway, so we can come back to that - but would be good for whoever gets there first to discuss it with the others)
>> > > >
>> > > > After thinking for a few days, I don't think this will bake anything
>> > > > new into the IR.  If anything it removes a few special cases.
>> > > >
>> > > > More at the bottom.
>> > > >
>> > > > >>> Motivation
>> > > > >>> ==========
>> > > > >>>
>> > > > >>> Based on some analysis Mehdi ran (ping him for details), there are three
>> > > > >>> (related) compile-time bottlenecks we're seeing with `-flto=thin -g`:
>> > > > >>>
>> > > > >>>  a) Reading the large number of Metadata bitcode records in the global
>> > > > >>>     metadata block.  I'm talking about raw `BitStreamer` calls here.
>> > > > >>>
>> > > > >>>  b) Creating unnecessary `DI*` instances (that aren't relevant to code).
>> > > > >>>
>> > > > >>>  c) Emitting unnecessary `DI*` instances (that aren't relevant to code).
>> > > > >>>
>> > > > >>> Here is my recollection of some peak memory stats on a small testcase
>> > > > >>> during thin-LTO, which should be a decent indicator of (b):
>> > > > >>>
>> > > > >>>   - ~150MB: DILocation
>> > > > >>>   - ~100MB: DISubprogram
>> > > > >>>   - ~70MB: DILocalVariable
>> > > > >>>   - ~50MB: (cumulative) DIType descendents
>> > > > >>>
>> > > > >>> It looks, suprisingly, like types are not the primary bottleneck.
>> > > >
>> > > > (Probably wrong for (a), BTW.  Caveats matter.)
>> > > >
>> > > > >>> There are caveats:
>> > > > >>>
>> > > > >>>   - `DISubprogram` declarations -- member function descriptors -- are
>> > > > >>>     part of the type hierarchy.
>> > > > >>>   - Most of the type hierarchy gets uniqued at parse time.
>> > > > >>>   - As a result, these data are a poor indicator for (a).
>> > > >
>> > > > ((a) is the main bottleneck for compile-time of -flto=thin (since it's
>> > > > quadratic in the number of files).  (b) only affects memory.  Also
>> > > > important, but at least it scales linearly.)
>> > > >
>> > > > >>> Even so, non-types are substantial.
>> > > > >>>
>> > > > >>> Proposal
>> > > > >>> ========
>> > > > >>>
>> > > > >>> Short version
>> > > > >>> -------------
>> > > > >>>
>> > > > >>>  4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2).
>> > > >
>> > > > This is the part that's relevant to the new RFC Eric just posted.
>> > > >
>> > > > >>> Long version
>> > > > >>> -------------
>> > > > >>>
>> > > > >>>  4. Implement my proposal to remove the `DICompositeType` name map from
>> > > > >>>     `retainedTypes:`.
>> > > > >>>
>> > > > >>>     http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html
>> > > > >>>
>> > > > >>>     Similar to (2) above, this will naturally filter the types that get
>> > > > >>>     linked in to the ones actually used by the code being linked.
>> > > > >>>
>> > > > >>>     It should also allow the reader to skip records for types that have
>> > > > >>>     already been loaded in the main module.
>> > > >
>> > > > The essential things I want to accomplish with this part:
>> > > >
>> > > >   - Make `type:` operands less special: instead of referencing types
>> > > >     indirectly through MDString, point directly at the type node.
>> > > >
>> > > >   - Stop using `retainedTypes:` by default (only for -gfull, etc.).
>> > > >
>> > > >   - Avoid blowing up memory in -flto=full (which converting MDString
>> > > >     references back to pointers would do naively, through
>> > > >     re-introducing cycles).  Note that this needs to be handled
>> > > >     somehow at BitcodeReader time.
>> > > >
>> > > > After chatting with Eric, I don't think this conflicts at all with the
>> > > > other RFC.  Unifying the `type:` operands might actually help both.
>> > > >
>> > > > One good point David mentioned last week was that we don't want to
>> > > > teach the IR any more about types.  Rather than inventing some new
>> > > > context (as I suggested originally), I figure LTO plugins can just
>> > > > pass a (StringRef -> DIType*) map to the BitcodeReader, and the Module
>> > > > itself doesn't need to know anything about it.
>