[llvm-dev] [RFC] Lazy-loading of debug info metadata

Adrian Prantl via llvm-dev llvm-dev at lists.llvm.org
Fri Apr 15 09:57:44 PDT 2016


> On Apr 15, 2016, at 9:50 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
> Since I haven't heard any objections to the direction, I'm planning to
> commit this (step 4) when I find some cycles; likely over the weekend.
> 
> To make this more concrete (in case someone is silently concerned) I've
> posted my WIP patches below.  They apply cleanly to r266414.  There are
> a few mechanical changes missing that are tracked in the commit
> messages (such as LangRef, or new tests in some cases).  To get
> `ninja check` to pass after the final commit you'll need to run the
> attached upgrade script.  The clang changes are truly uninteresting so
> I've left them out.
> 
>  - 0001/2: Prep commits I need to flush out (sorry for the noise).
>  - 0003: Add an explicit type map for ODR type uniquing.
>  - 0004: Prep.
>  - 0005: Add ODR type uniquing of members.
>  - 0006: Remove DITypeRef (string-based references) and strip down
>    'retainedTypes:’.

It would be great if you could document how ODR type uniquing will work in SourceLevelDebugging.rst, too.

-- adrian

> 
> Once this is done, I expect the bitcode block layout improvements for
> lazy-loading of subprograms and composite types (steps 3 and 5 from
> this RFC) to be fairly straightforward.
> 
> <0001-LTO-Use-a-common-LibLTOCodeGenerator-init-NFC.patch><0002-IR-Remove-extra-blank-line-NFC.patch><0003-WIP-IR-Use-an-explicit-map-for-debug-info-type-uniqu.patch><0004-IR-Add-a-configuration-point-for-MDNodeInfo-isEqual-.patch><0005-WIP-IR-Use-ODR-to-unique-DICompositeType-members.patch><0006-WIP-IR-Remove-DITypeRef-DIScopeRef-and-DINodeRef.patch><deref-typerefs.sh>
> 
>> On 2016-Mar-29, at 19:11, Eric Christopher <echristo at gmail.com> wrote:
>> 
>> I have no objections to any of this FWIW :)
>> 
>> -eric
>> 
>> On Tue, Mar 29, 2016 at 6:46 PM Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> 
>>> On 2016-Mar-22, at 20:11, Eric Christopher <echristo at gmail.com> wrote:
>>> 
>>>> On Tue, Mar 22, 2016 at 8:04 PM David Blaikie <dblaikie at gmail.com> wrote:
>>>> +pcc, who had some other ideas/patch out for improving memory usage of debug info
>>>> +Reid, who's responsible for the windows/CodeView/PDB debug info which is motivating some of the ideas about changes to type emission
>>> 
>>> So I discussed this with Adrian and Mehdi at the social last Thursday and I'm getting set to finish the write up. I think it'll have some bearing on this proposal as I think it'll change how we want to take a look at the format of DISubprogram metadata a bit more.
>> 
>> (The interesting bit here is to make a clearer split between
>> DISubprogram declarations (part of the type hierarchY) and
>> DISubprogram definitions (part of the code/line table/variable
>> locations).  I think that'll end up being mostly orthogonal to what
>> I'm trying to do.)
>> 
>>> That said, most of it is orthogonal to the changes Duncan is talking about here. Just puts the pressure on to get the other proposal written up.
>> 
>> Which is now here:
>> http://lists.llvm.org/pipermail/llvm-dev/2016-March/097773.html
>> 
>>>> Baking into the IR more about types as units has pretty direct overlap with Reid/CodeView/etc - so, yeah, that'll takes ome discussion (but, as you say, it's not in your immediate plan anyway, so we can come back to that - but would be good for whoever gets there first to discuss it with the others)
>> 
>> After thinking for a few days, I don't think this will bake anything
>> new into the IR.  If anything it removes a few special cases.
>> 
>> More at the bottom.
>> 
>>>>> Motivation
>>>>> ==========
>>>>> 
>>>>> Based on some analysis Mehdi ran (ping him for details), there are three
>>>>> (related) compile-time bottlenecks we're seeing with `-flto=thin -g`:
>>>>> 
>>>>> a) Reading the large number of Metadata bitcode records in the global
>>>>>    metadata block.  I'm talking about raw `BitStreamer` calls here.
>>>>> 
>>>>> b) Creating unnecessary `DI*` instances (that aren't relevant to code).
>>>>> 
>>>>> c) Emitting unnecessary `DI*` instances (that aren't relevant to code).
>>>>> 
>>>>> Here is my recollection of some peak memory stats on a small testcase
>>>>> during thin-LTO, which should be a decent indicator of (b):
>>>>> 
>>>>>  - ~150MB: DILocation
>>>>>  - ~100MB: DISubprogram
>>>>>  - ~70MB: DILocalVariable
>>>>>  - ~50MB: (cumulative) DIType descendents
>>>>> 
>>>>> It looks, suprisingly, like types are not the primary bottleneck.
>> 
>> (Probably wrong for (a), BTW.  Caveats matter.)
>> 
>>>>> There are caveats:
>>>>> 
>>>>>  - `DISubprogram` declarations -- member function descriptors -- are
>>>>>    part of the type hierarchy.
>>>>>  - Most of the type hierarchy gets uniqued at parse time.
>>>>>  - As a result, these data are a poor indicator for (a).
>> 
>> ((a) is the main bottleneck for compile-time of -flto=thin (since it's
>> quadratic in the number of files).  (b) only affects memory.  Also
>> important, but at least it scales linearly.)
>> 
>>>>> Even so, non-types are substantial.
>>>>> 
>>>>> Proposal
>>>>> ========
>>>>> 
>>>>> Short version
>>>>> -------------
>>>>> 
>>>>> 4. Remove `DICompositeType`s from `retainedTypes:`, similar to (2).
>> 
>> This is the part that's relevant to the new RFC Eric just posted.
>> 
>>>>> Long version
>>>>> -------------
>>>>> 
>>>>> 4. Implement my proposal to remove the `DICompositeType` name map from
>>>>>    `retainedTypes:`.
>>>>> 
>>>>>    http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160125/327936.html
>>>>> 
>>>>>    Similar to (2) above, this will naturally filter the types that get
>>>>>    linked in to the ones actually used by the code being linked.
>>>>> 
>>>>>    It should also allow the reader to skip records for types that have
>>>>>    already been loaded in the main module.
>> 
>> The essential things I want to accomplish with this part:
>> 
>>  - Make `type:` operands less special: instead of referencing types
>>    indirectly through MDString, point directly at the type node.
>> 
>>  - Stop using `retainedTypes:` by default (only for -gfull, etc.).
>> 
>>  - Avoid blowing up memory in -flto=full (which converting MDString
>>    references back to pointers would do naively, through
>>    re-introducing cycles).  Note that this needs to be handled
>>    somehow at BitcodeReader time.
>> 
>> After chatting with Eric, I don't think this conflicts at all with the
>> other RFC.  Unifying the `type:` operands might actually help both.
>> 
>> One good point David mentioned last week was that we don't want to
>> teach the IR any more about types.  Rather than inventing some new
>> context (as I suggested originally), I figure LTO plugins can just
>> pass a (StringRef -> DIType*) map to the BitcodeReader, and the Module
>> itself doesn't need to know anything about it.
> 



More information about the llvm-dev mailing list