[cfe-dev] RFC: Up front type information generation in clang and llvm

Tue Mar 29 19:31:59 PDT 2016

Thanks for sharing this. Mostly seems like a reasonable plan to me. A few
comments below.

On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi All,
>
> This is something that's been talked about for some time and it's probably
> time to propose it.
>
> The "We" in this document is everyone on the cc line plus me.
>
> Please go ahead and take a look.
>
> Thanks!
>
> -eric
>
>
> Objective (and TL;DR)
> =================
>
> Migrate debug type information generation from the backends to the front
> end.
>
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
> about C preprocessor macros, Obj-C properties, or extensive details about
> debug information binary formats.
> 2. Performance: Skipping a serialization should speed up normal
> compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were,
> but are still fairly large and pointer heavy.
>
> Motivation
> ========
>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only somewhat
> format neutral representation of debug types. Making this more format
> neutral will only increase the memory use - and for no reason as type
> information is static (or nearly so). Debug formats already have a memory
> efficient serialization, their own binary format so we should support a
> front end emitting type information with sufficient representation to allow
> the backend to emit debug information based on the more normal IR features:
> functions, scopes, variables, etc.
>
> Scope/Impact
> ===========
>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing massive
> infrastructure changes.
>
> Related work
> ==========
>
> This is related to the efforts to support CodeView in LLVM and clang as
> well as efforts to reduce overall memory consumption when compiling with
> debug information enabled;  in particular efforts to prune LTO memory usage.
>
>
> Concerns
> ========
>
>
> We need a good story for transitioning all the debug info testcases in the
> backend without giving up coverage and/or readability. David believes he
> has a plan here.
>
> Proposal
> =======
>
> Short version
> -----------------
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
> 5. Remove the old binary emission out of LLVM.
>
>
> Questions/Thoughts/Elaboration
> -------------------------------------------
>
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
> down version with an abstract origin.
>
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
> blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
> Unit and use references and module relocations for the offsets. (See below)
>
> How will we handle references in DWARF2 or global relocations for non-type
> template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within
> the DIBlob as where to write the final relocation/offset for the reference
> at emission time.
>
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
> that won’t be huge in size. We break up the types so we don’t need to parse
> debug information to link two modules together efficiently.
>

How do you plan to handle abbreviations? You wouldn't necessarily be able
to embed them directly in the blob, as when doing LTO each compilation unit
would have its own set of abbreviations. I suppose you could do something
like treat them as a special sort of reference to an abbreviation table
entry, or maybe pre-allocate in the frontend (but would complicate
cross-frontend LTO) but curious what you have in mind.

Any other concerns there?
>    * Debug information without type units might be slightly larger in this
> scheme due to parents being duplicated (declarations and abstract origin,
> not full parents). It may be possible to extend dsymutil/etc to merge all
> siblings into a common parent. Open question for better ways to solve this.
>

When we were thinking about teaching the backend to produce blobs from IR
metadata we were thinking about cases where the debug info emitter would
discover special member functions during IR traversal. I guess since we're
moving all of that to the frontend we can just ask the frontend directly
which special members are needed on the class. That solves the problem for
a single translation unit. But what do you plan to do in the multiple
translation unit case where two TUs declare different special members on a
class? Would it be fine to just emit the two definitions and let the
debugger sort it out? I guess this is the type of thing that debuggers
normally deal with in the non-LTO case, so I suppose so?

> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
> from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
>
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
> everything during implementation.
>
>
> Future work
> ----------------
>
> Not contained as part of this, but an obvious future direction is that the
> Module linker could grow support for debug aware linking. Then we can have
> all of the type information for a single translation unit in a single blob
> and use the debug aware linking to handle merging types.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160329/592080ef/attachment.html>