[cfe-dev] RFC: Up front type information generation in clang and llvm

Wed Mar 30 11:36:39 PDT 2016

On Wed, Mar 30, 2016 at 8:40 AM Adrian Prantl <aprantl at apple.com> wrote:

> On Mar 29, 2016, at 11:35 PM, mats petersson via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> How will this affect other languages that generate debug info - not that
> you should care about those, I'm just curious - my Pascal compiler does not
> generate clang-style AST, and does not use clang at all. I currently have
> code that in uses DIBuilder directly...
>
>
> I don’t think that the code for generating DWARF types should move into
> Clang, but rather in a separate library that can be shared by multiple
> frontends. It can even keep most of the existing DIBuilder interface (but
> we may need to split DIBuilder in a types vs. everything else part).
>
>
There will need to be an API split between front end and backend. We can
attempt to keep a lot of the current DIBuilder interface, but it's going to
make sense to have a split in the front end as well that can directly call
an emission library. Ideally we'll make it look a lot more like dwarf than
the current abstract interface, but we'll see.

-eric

> -- adrian
>
>
> --
> Mats
>
> On 30 March 2016 at 04:15, Eric Christopher via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Tue, Mar 29, 2016 at 8:11 PM Peter Collingbourne <peter at pcc.me.uk>
>> wrote:
>>
>>> On Tue, Mar 29, 2016 at 7:43 PM, Eric Christopher <echristo at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at pcc.me.uk>
>>>> wrote:
>>>>
>>>>> Thanks for sharing this. Mostly seems like a reasonable plan to me. A
>>>>> few comments below.
>>>>>
>>>>>
>>>> Thanks Peter!
>>>>
>>>>
>>>>> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev <
>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> This is something that's been talked about for some time and it's
>>>>>> probably time to propose it.
>>>>>>
>>>>>> The "We" in this document is everyone on the cc line plus me.
>>>>>>
>>>>>> Please go ahead and take a look.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> -eric
>>>>>>
>>>>>>
>>>>>> Objective (and TL;DR)
>>>>>> =================
>>>>>>
>>>>>> Migrate debug type information generation from the backends to the
>>>>>> front end.
>>>>>>
>>>>>> This will enable:
>>>>>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
>>>>>> know about C preprocessor macros, Obj-C properties, or extensive details
>>>>>> about debug information binary formats.
>>>>>> 2. Performance: Skipping a serialization should speed up normal
>>>>>> compilations.
>>>>>> 3. Memory usage: The DI metadata structures are smaller than they
>>>>>> were, but are still fairly large and pointer heavy.
>>>>>>
>>>>>> Motivation
>>>>>> ========
>>>>>>
>>>>>> Currently, types in LLVM debug info are described by the DIType class
>>>>>> hierarchy. This hierarchy evolved organically from a more flexible
>>>>>> sea-of-nodes representation into what it is today - a large, only somewhat
>>>>>> format neutral representation of debug types. Making this more format
>>>>>> neutral will only increase the memory use - and for no reason as type
>>>>>> information is static (or nearly so). Debug formats already have a memory
>>>>>> efficient serialization, their own binary format so we should support a
>>>>>> front end emitting type information with sufficient representation to allow
>>>>>> the backend to emit debug information based on the more normal IR features:
>>>>>> functions, scopes, variables, etc.
>>>>>>
>>>>>> Scope/Impact
>>>>>> ===========
>>>>>>
>>>>>> This is going to involve large scale changes across both LLVM and
>>>>>> clang. This will also affect any out-of-tree front ends, however, we expect
>>>>>> the impact to be on the order of a large API change rather than needing
>>>>>> massive infrastructure changes.
>>>>>>
>>>>>> Related work
>>>>>> ==========
>>>>>>
>>>>>> This is related to the efforts to support CodeView in LLVM and clang
>>>>>> as well as efforts to reduce overall memory consumption when compiling with
>>>>>> debug information enabled;  in particular efforts to prune LTO memory usage.
>>>>>>
>>>>>>
>>>>>> Concerns
>>>>>> ========
>>>>>>
>>>>>>
>>>>>> We need a good story for transitioning all the debug info testcases
>>>>>> in the backend without giving up coverage and/or readability. David
>>>>>> believes he has a plan here.
>>>>>>
>>>>>> Proposal
>>>>>> =======
>>>>>>
>>>>>> Short version
>>>>>> -----------------
>>>>>>
>>>>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
>>>>>> Table.
>>>>>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>>>>>> 3. Add a LLVM DWARF emission library similar to the existing CodeView
>>>>>> one.
>>>>>> 4. Migrate the Types API into a clang internal API taking clang AST
>>>>>> structures and use the LLVM binary emission libraries to produce type
>>>>>> information.
>>>>>> 5. Remove the old binary emission out of LLVM.
>>>>>>
>>>>>>
>>>>>> Questions/Thoughts/Elaboration
>>>>>> -------------------------------------------
>>>>>>
>>>>>> Splitting the DIBuilder API
>>>>>> ~~~~~~~~~~~~~~~~~~~~
>>>>>> Will DISubprogram be part of both?
>>>>>>    * We should split it in two: Full declarations with type and a
>>>>>> slimmed down version with an abstract origin.
>>>>>>
>>>>>> How will we reference types in the DWARF blob?
>>>>>>    * ODR types can be referenced by name
>>>>>>    * Non-odr types by full DWARF hash
>>>>>>    * Each type can be a pair(tuple) of identifier (DITypeRef today)
>>>>>> and blob.
>>>>>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF
>>>>>> Type Unit and use references and module relocations for the offsets. (See
>>>>>> below)
>>>>>>
>>>>>> How will we handle references in DWARF2 or global relocations for
>>>>>> non-type template parameters?
>>>>>>    * We can use a “relocation” metadata as part of the format.
>>>>>>    * Representable as a tuple that has the DIType and the offset
>>>>>> within the DIBlob as where to write the final relocation/offset for the
>>>>>> reference at emission time.
>>>>>>
>>>>>> Why break up the types at all?
>>>>>>    * To enable non-debug format aware linking and type uniquing for
>>>>>> LTO that won’t be huge in size. We break up the types so we don’t need to
>>>>>> parse debug information to link two modules together efficiently.
>>>>>>
>>>>>
>>>>> How do you plan to handle abbreviations? You wouldn't necessarily be
>>>>> able to embed them directly in the blob, as when doing LTO each compilation
>>>>> unit would have its own set of abbreviations. I suppose you could do
>>>>> something like treat them as a special sort of reference to an abbreviation
>>>>> table entry, or maybe pre-allocate in the frontend (but would complicate
>>>>> cross-frontend LTO) but curious what you have in mind.
>>>>>
>>>>
>>>> Thanks for reminding me, I knew I was forgetting something I'd talked
>>>> about when writing all of this down. :)
>>>>
>>>> Basically to handle abbreviations you can do them the similarly to
>>>> types by creating a blob with an index/hash/etc and then reference that as
>>>> part of the type tuple, e.g.:
>>>>
>>>> $1 = { DIAbbrev: 0x1234, DIBlob: <blah> }
>>>> $2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> }
>>>>
>>>> and keep them uniqued during emission and remember to merge these as
>>>> well during module merge time.
>>>>
>>>
>>> Makes sense, but wouldn't you need multiple abbreviations for each
>>> DIType, in order to represent DITypes formed of multiple DIEs (e.g. enums,
>>> records)?
>>>
>>> Maybe something like this would work:
>>>
>>> $1 = { DIAbbrev: 0x1234, DIBlob: DW_TAG_enumeration_type<blah> }
>>> $2 = { DIAbbrev: 0x5678, DIBlob: DW_TAG_enumerator<blah> }
>>> $3 = { DIType: <ID>, DIAbbrev: [(0, $1), (8, $2), (16, $2)], DIBlob: <8
>>> bytes of DW_TAG_enumeration_type attrs><8 bytes of DW_TAG_enumerator
>>> attrs><8 bytes of DW_TAG_enumerator attrs><0> }
>>>
>>> ?
>>>
>>
>> *nod* That (or something similar) will work.
>>
>> -eric
>>
>>
>>
>>>
>>>
>>>>
>>>>>
>>>>> Any other concerns there?
>>>>>>    * Debug information without type units might be slightly larger in
>>>>>> this scheme due to parents being duplicated (declarations and abstract
>>>>>> origin, not full parents). It may be possible to extend dsymutil/etc to
>>>>>> merge all siblings into a common parent. Open question for better ways to
>>>>>> solve this.
>>>>>>
>>>>>
>>>>> When we were thinking about teaching the backend to produce blobs from
>>>>> IR metadata we were thinking about cases where the debug info emitter would
>>>>> discover special member functions during IR traversal. I guess since we're
>>>>> moving all of that to the frontend we can just ask the frontend directly
>>>>> which special members are needed on the class. That solves the problem for
>>>>> a single translation unit. But what do you plan to do in the multiple
>>>>> translation unit case where two TUs declare different special members on a
>>>>> class? Would it be fine to just emit the two definitions and let the
>>>>> debugger sort it out? I guess this is the type of thing that debuggers
>>>>> normally deal with in the non-LTO case, so I suppose so?
>>>>>
>>>>
>>>> Pretty much. This is one area where I have... disagreements with the
>>>> DWARF committee and I don't think there's anything else we can do here. TBH
>>>> right now I think we'd have issues with type units and special member
>>>> functions since we're using ODR-ness to unique.
>>>>
>>>> -eric
>>>>
>>>>
>>>>>
>>>>>
>>>>>> How should we handle DWARF5/Apple Accelerator Tables?
>>>>>>    * Thoughts:
>>>>>>    * We can parse the dwarf in the back end and generate them.
>>>>>>    * We can emit in the front end for the base case of non-LTO (with
>>>>>> help from the backend for relocation aspects).
>>>>>>    * We can use dsymutil on LTO debug information to generate them.
>>>>>>
>>>>>> Why isn’t this a more detailed spec?
>>>>>>    * Mostly because we’ve thought about the issues, but we can’t plan
>>>>>> for everything during implementation.
>>>>>>
>>>>>>
>>>>>> Future work
>>>>>> ----------------
>>>>>>
>>>>>> Not contained as part of this, but an obvious future direction is
>>>>>> that the Module linker could grow support for debug aware linking. Then we
>>>>>> can have all of the type information for a single translation unit in a
>>>>>> single blob and use the debug aware linking to handle merging types.
>>>>>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160330/5ab5aa67/attachment.html>