[cfe-dev] RFC: Up front type information generation in clang and llvm

Tue Mar 29 18:03:29 PDT 2016

(To be clear: Reid, Adrian, Duncan, Dave, and myself.)

On Tue, Mar 29, 2016 at 6:00 PM Eric Christopher <echristo at gmail.com> wrote:

> Hi All,
>
> This is something that's been talked about for some time and it's probably
> time to propose it.
>
> The "We" in this document is everyone on the cc line plus me.
>
> Please go ahead and take a look.
>
> Thanks!
>
> -eric
>
>
> Objective (and TL;DR)
> =================
>
> Migrate debug type information generation from the backends to the front
> end.
>
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
> about C preprocessor macros, Obj-C properties, or extensive details about
> debug information binary formats.
> 2. Performance: Skipping a serialization should speed up normal
> compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were,
> but are still fairly large and pointer heavy.
>
> Motivation
> ========
>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only somewhat
> format neutral representation of debug types. Making this more format
> neutral will only increase the memory use - and for no reason as type
> information is static (or nearly so). Debug formats already have a memory
> efficient serialization, their own binary format so we should support a
> front end emitting type information with sufficient representation to allow
> the backend to emit debug information based on the more normal IR features:
> functions, scopes, variables, etc.
>
> Scope/Impact
> ===========
>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing massive
> infrastructure changes.
>
> Related work
> ==========
>
> This is related to the efforts to support CodeView in LLVM and clang as
> well as efforts to reduce overall memory consumption when compiling with
> debug information enabled;  in particular efforts to prune LTO memory usage.
>
>
> Concerns
> ========
>
>
> We need a good story for transitioning all the debug info testcases in the
> backend without giving up coverage and/or readability. David believes he
> has a plan here.
>
> Proposal
> =======
>
> Short version
> -----------------
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
> 5. Remove the old binary emission out of LLVM.
>
>
> Questions/Thoughts/Elaboration
> -------------------------------------------
>
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
> down version with an abstract origin.
>
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
> blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
> Unit and use references and module relocations for the offsets. (See below)
>
> How will we handle references in DWARF2 or global relocations for non-type
> template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within
> the DIBlob as where to write the final relocation/offset for the reference
> at emission time.
>
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
> that won’t be huge in size. We break up the types so we don’t need to parse
> debug information to link two modules together efficiently.
>
> Any other concerns there?
>    * Debug information without type units might be slightly larger in this
> scheme due to parents being duplicated (declarations and abstract origin,
> not full parents). It may be possible to extend dsymutil/etc to merge all
> siblings into a common parent. Open question for better ways to solve this.
>
> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
> from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
>
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
> everything during implementation.
>
>
> Future work
> ----------------
>
> Not contained as part of this, but an obvious future direction is that the
> Module linker could grow support for debug aware linking. Then we can have
> all of the type information for a single translation unit in a single blob
> and use the debug aware linking to handle merging types.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160330/dde38750/attachment.html>