[llvm-dev] RFC: Up front type information generation in clang and llvm

Wed Apr 27 16:51:38 PDT 2016

Somehow I managed to respond without being explicit about the difference
between your design and mine: I'm saying we should just have one type blob
per TU. This will avoid the need for cross-blob references, but it will
necessitate format-aware type handing during LTO and LTO-like use-cases
(ThinLTO, llvm-extract, etc).

On Wed, Apr 27, 2016 at 4:41 PM, Reid Kleckner <rnk at google.com> wrote:

> My general feeling is that this design represents a mid-point between our
> current metadata design, and a future design where frontends just emit type
> information and LTO links it in a format-aware way.
>
> I don't think it's an imminent priority for anyone to do this for DWARF,
> so I worry that if we start building infrastructure for it, it will end up
> overengineered.
>
> Also, people seem to agree that in the long term, we really need a
> format-aware linker, and maybe LTO should just use one. Supposedly Frédéric
> has patches to llvm-dsymutil to make one for DWARF, but he hasn't found the
> time to upstream them.
>
> Together, these reasons make me feel that we should limit the short-term
> scope to just CodeView, and add utilities to lib/Linker for performing
> basic tasks like type stream merging or type extraction, possibly with
> forward declaration of composite types.
>
> In the future, when we do this work for DWARF, we can add a new DIType*
> stand-in similar to what you are describing.
>
> The working patch that I have for just CodeView, all types as a single
> blob, is up here: http://reviews.llvm.org/D19236 While it doesn't deal
> with type blobs or LTO type merging yet, I think it shows that there is
> surprisingly little need to bifurcate other parts of LLVM.
>
> Thoughts?
>
> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher <echristo at gmail.com>
> wrote:
>
>> Hi All,
>>
>> This is something that's been talked about for some time and it's
>> probably time to propose it.
>>
>> The "We" in this document is everyone on the cc line plus me.
>>
>> Please go ahead and take a look.
>>
>> Thanks!
>>
>> -eric
>>
>>
>> Objective (and TL;DR)
>> =================
>>
>> Migrate debug type information generation from the backends to the front
>> end.
>>
>> This will enable:
>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
>> know about C preprocessor macros, Obj-C properties, or extensive details
>> about debug information binary formats.
>> 2. Performance: Skipping a serialization should speed up normal
>> compilations.
>> 3. Memory usage: The DI metadata structures are smaller than they were,
>> but are still fairly large and pointer heavy.
>>
>> Motivation
>> ========
>>
>> Currently, types in LLVM debug info are described by the DIType class
>> hierarchy. This hierarchy evolved organically from a more flexible
>> sea-of-nodes representation into what it is today - a large, only somewhat
>> format neutral representation of debug types. Making this more format
>> neutral will only increase the memory use - and for no reason as type
>> information is static (or nearly so). Debug formats already have a memory
>> efficient serialization, their own binary format so we should support a
>> front end emitting type information with sufficient representation to allow
>> the backend to emit debug information based on the more normal IR features:
>> functions, scopes, variables, etc.
>>
>> Scope/Impact
>> ===========
>>
>> This is going to involve large scale changes across both LLVM and clang.
>> This will also affect any out-of-tree front ends, however, we expect the
>> impact to be on the order of a large API change rather than needing massive
>> infrastructure changes.
>>
>> Related work
>> ==========
>>
>> This is related to the efforts to support CodeView in LLVM and clang as
>> well as efforts to reduce overall memory consumption when compiling with
>> debug information enabled;  in particular efforts to prune LTO memory usage.
>>
>>
>> Concerns
>> ========
>>
>>
>> We need a good story for transitioning all the debug info testcases in
>> the backend without giving up coverage and/or readability. David believes
>> he has a plan here.
>>
>> Proposal
>> =======
>>
>> Short version
>> -----------------
>>
>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
>> Table.
>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
>> 4. Migrate the Types API into a clang internal API taking clang AST
>> structures and use the LLVM binary emission libraries to produce type
>> information.
>> 5. Remove the old binary emission out of LLVM.
>>
>>
>> Questions/Thoughts/Elaboration
>> -------------------------------------------
>>
>> Splitting the DIBuilder API
>> ~~~~~~~~~~~~~~~~~~~~
>> Will DISubprogram be part of both?
>>    * We should split it in two: Full declarations with type and a slimmed
>> down version with an abstract origin.
>>
>> How will we reference types in the DWARF blob?
>>    * ODR types can be referenced by name
>>    * Non-odr types by full DWARF hash
>>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
>> blob.
>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
>> Unit and use references and module relocations for the offsets. (See below)
>>
>> How will we handle references in DWARF2 or global relocations for
>> non-type template parameters?
>>    * We can use a “relocation” metadata as part of the format.
>>    * Representable as a tuple that has the DIType and the offset within
>> the DIBlob as where to write the final relocation/offset for the reference
>> at emission time.
>>
>> Why break up the types at all?
>>    * To enable non-debug format aware linking and type uniquing for LTO
>> that won’t be huge in size. We break up the types so we don’t need to parse
>> debug information to link two modules together efficiently.
>>
>> Any other concerns there?
>>    * Debug information without type units might be slightly larger in
>> this scheme due to parents being duplicated (declarations and abstract
>> origin, not full parents). It may be possible to extend dsymutil/etc to
>> merge all siblings into a common parent. Open question for better ways to
>> solve this.
>>
>> How should we handle DWARF5/Apple Accelerator Tables?
>>    * Thoughts:
>>    * We can parse the dwarf in the back end and generate them.
>>    * We can emit in the front end for the base case of non-LTO (with help
>> from the backend for relocation aspects).
>>    * We can use dsymutil on LTO debug information to generate them.
>>
>> Why isn’t this a more detailed spec?
>>    * Mostly because we’ve thought about the issues, but we can’t plan for
>> everything during implementation.
>>
>>
>> Future work
>> ----------------
>>
>> Not contained as part of this, but an obvious future direction is that
>> the Module linker could grow support for debug aware linking. Then we can
>> have all of the type information for a single translation unit in a single
>> blob and use the debug aware linking to handle merging types.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/80ed7d31/attachment.html>