[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 29 20:11:51 PDT 2016


On Tue, Mar 29, 2016 at 7:43 PM, Eric Christopher <echristo at gmail.com>
wrote:

>
>
> On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>> Thanks for sharing this. Mostly seems like a reasonable plan to me. A few
>> comments below.
>>
>>
> Thanks Peter!
>
>
>> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> Hi All,
>>>
>>> This is something that's been talked about for some time and it's
>>> probably time to propose it.
>>>
>>> The "We" in this document is everyone on the cc line plus me.
>>>
>>> Please go ahead and take a look.
>>>
>>> Thanks!
>>>
>>> -eric
>>>
>>>
>>> Objective (and TL;DR)
>>> =================
>>>
>>> Migrate debug type information generation from the backends to the front
>>> end.
>>>
>>> This will enable:
>>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
>>> know about C preprocessor macros, Obj-C properties, or extensive details
>>> about debug information binary formats.
>>> 2. Performance: Skipping a serialization should speed up normal
>>> compilations.
>>> 3. Memory usage: The DI metadata structures are smaller than they were,
>>> but are still fairly large and pointer heavy.
>>>
>>> Motivation
>>> ========
>>>
>>> Currently, types in LLVM debug info are described by the DIType class
>>> hierarchy. This hierarchy evolved organically from a more flexible
>>> sea-of-nodes representation into what it is today - a large, only somewhat
>>> format neutral representation of debug types. Making this more format
>>> neutral will only increase the memory use - and for no reason as type
>>> information is static (or nearly so). Debug formats already have a memory
>>> efficient serialization, their own binary format so we should support a
>>> front end emitting type information with sufficient representation to allow
>>> the backend to emit debug information based on the more normal IR features:
>>> functions, scopes, variables, etc.
>>>
>>> Scope/Impact
>>> ===========
>>>
>>> This is going to involve large scale changes across both LLVM and clang.
>>> This will also affect any out-of-tree front ends, however, we expect the
>>> impact to be on the order of a large API change rather than needing massive
>>> infrastructure changes.
>>>
>>> Related work
>>> ==========
>>>
>>> This is related to the efforts to support CodeView in LLVM and clang as
>>> well as efforts to reduce overall memory consumption when compiling with
>>> debug information enabled;  in particular efforts to prune LTO memory usage.
>>>
>>>
>>> Concerns
>>> ========
>>>
>>>
>>> We need a good story for transitioning all the debug info testcases in
>>> the backend without giving up coverage and/or readability. David believes
>>> he has a plan here.
>>>
>>> Proposal
>>> =======
>>>
>>> Short version
>>> -----------------
>>>
>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
>>> Table.
>>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>>> 3. Add a LLVM DWARF emission library similar to the existing CodeView
>>> one.
>>> 4. Migrate the Types API into a clang internal API taking clang AST
>>> structures and use the LLVM binary emission libraries to produce type
>>> information.
>>> 5. Remove the old binary emission out of LLVM.
>>>
>>>
>>> Questions/Thoughts/Elaboration
>>> -------------------------------------------
>>>
>>> Splitting the DIBuilder API
>>> ~~~~~~~~~~~~~~~~~~~~
>>> Will DISubprogram be part of both?
>>>    * We should split it in two: Full declarations with type and a
>>> slimmed down version with an abstract origin.
>>>
>>> How will we reference types in the DWARF blob?
>>>    * ODR types can be referenced by name
>>>    * Non-odr types by full DWARF hash
>>>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
>>> blob.
>>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
>>> Unit and use references and module relocations for the offsets. (See below)
>>>
>>> How will we handle references in DWARF2 or global relocations for
>>> non-type template parameters?
>>>    * We can use a “relocation” metadata as part of the format.
>>>    * Representable as a tuple that has the DIType and the offset within
>>> the DIBlob as where to write the final relocation/offset for the reference
>>> at emission time.
>>>
>>> Why break up the types at all?
>>>    * To enable non-debug format aware linking and type uniquing for LTO
>>> that won’t be huge in size. We break up the types so we don’t need to parse
>>> debug information to link two modules together efficiently.
>>>
>>
>> How do you plan to handle abbreviations? You wouldn't necessarily be able
>> to embed them directly in the blob, as when doing LTO each compilation unit
>> would have its own set of abbreviations. I suppose you could do something
>> like treat them as a special sort of reference to an abbreviation table
>> entry, or maybe pre-allocate in the frontend (but would complicate
>> cross-frontend LTO) but curious what you have in mind.
>>
>
> Thanks for reminding me, I knew I was forgetting something I'd talked
> about when writing all of this down. :)
>
> Basically to handle abbreviations you can do them the similarly to types
> by creating a blob with an index/hash/etc and then reference that as part
> of the type tuple, e.g.:
>
> $1 = { DIAbbrev: 0x1234, DIBlob: <blah> }
> $2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> }
>
> and keep them uniqued during emission and remember to merge these as well
> during module merge time.
>

Makes sense, but wouldn't you need multiple abbreviations for each DIType,
in order to represent DITypes formed of multiple DIEs (e.g. enums, records)?

Maybe something like this would work:

$1 = { DIAbbrev: 0x1234, DIBlob: DW_TAG_enumeration_type<blah> }
$2 = { DIAbbrev: 0x5678, DIBlob: DW_TAG_enumerator<blah> }
$3 = { DIType: <ID>, DIAbbrev: [(0, $1), (8, $2), (16, $2)], DIBlob: <8
bytes of DW_TAG_enumeration_type attrs><8 bytes of DW_TAG_enumerator
attrs><8 bytes of DW_TAG_enumerator attrs><0> }

?


>
>>
>> Any other concerns there?
>>>    * Debug information without type units might be slightly larger in
>>> this scheme due to parents being duplicated (declarations and abstract
>>> origin, not full parents). It may be possible to extend dsymutil/etc to
>>> merge all siblings into a common parent. Open question for better ways to
>>> solve this.
>>>
>>
>> When we were thinking about teaching the backend to produce blobs from IR
>> metadata we were thinking about cases where the debug info emitter would
>> discover special member functions during IR traversal. I guess since we're
>> moving all of that to the frontend we can just ask the frontend directly
>> which special members are needed on the class. That solves the problem for
>> a single translation unit. But what do you plan to do in the multiple
>> translation unit case where two TUs declare different special members on a
>> class? Would it be fine to just emit the two definitions and let the
>> debugger sort it out? I guess this is the type of thing that debuggers
>> normally deal with in the non-LTO case, so I suppose so?
>>
>
> Pretty much. This is one area where I have... disagreements with the DWARF
> committee and I don't think there's anything else we can do here. TBH right
> now I think we'd have issues with type units and special member functions
> since we're using ODR-ness to unique.
>
> -eric
>
>
>>
>>
>>> How should we handle DWARF5/Apple Accelerator Tables?
>>>    * Thoughts:
>>>    * We can parse the dwarf in the back end and generate them.
>>>    * We can emit in the front end for the base case of non-LTO (with
>>> help from the backend for relocation aspects).
>>>    * We can use dsymutil on LTO debug information to generate them.
>>>
>>> Why isn’t this a more detailed spec?
>>>    * Mostly because we’ve thought about the issues, but we can’t plan
>>> for everything during implementation.
>>>
>>>
>>> Future work
>>> ----------------
>>>
>>> Not contained as part of this, but an obvious future direction is that
>>> the Module linker could grow support for debug aware linking. Then we can
>>> have all of the type information for a single translation unit in a single
>>> blob and use the debug aware linking to handle merging types.
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160329/0b043e48/attachment.html>


More information about the llvm-dev mailing list