[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 31 21:52:22 PDT 2016


Thanks - will keep that in mind!
On Mar 31, 2016 9:35 PM, "Mehdi Amini" <mehdi.amini at apple.com> wrote:

>
> On Mar 31, 2016, at 8:50 PM, Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> On Mar 31, 2016, at 7:11 PM, David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> On Tue, Mar 29, 2016 at 11:50 PM, Eric Christopher via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <
>> Paul_Robinson at playstation.sony.com> wrote:
>>
>>> Skipping a serialization and doing something clever about LTO uniquing
>>> sounds awesome.  I'm guessing you achieve this by extracting types out of
>>> DI metadata and packaging them as lumps-o-DWARF that the back-end can then
>>> paste together?  Reading between the lines a bit here.
>>>
>>>
>> Pretty much, yes.
>>
>>
>>> Can you share data about how much "pure" types dominate the size of
>>> debug info?  Or at least the current metadata scheme?  (Channeling Sean
>>> Silva here: show me the data!)  Does this hold for C as well as C++?
>>>
>> They're huge. It's ridiculous. Take a look at the size of the metadata
>> and then the size of the stuff we put in there versus dwarf.
>>
>
> Because numbers are nice to have, I modified Clang to generate every type
> as 'int' (patch attached - I may've screwed some things up) & then compiled
> llvm-tblgen's object files with -flto (I would've used all of clang, but I
> don't have the lto plugin setup, so I couldn't get past tblgen)
>
>
> I guess you have a non-LTO build somewhere, so you should be able to build
> other tools by bypassing the llvm-tblgen build using:
>
> cmake -DLLVM_TABLEGEN=path/to/llvm-tblgen ..
>
>
> To be clear: that was meant as FYI / good to know, I was not asking you
> for more data.
>
> --
> Mehdi
>
>
>
>
>
>
> Without debug info: 77 MB of bitcode files
> With debug info: 24 MB
> With debug info, but no types: 46 MB
>
> so... 59% is pure type descriptions (these are the pure ones, the same
> things we put in type units - I didn't even remove the injected
> declarations (so if you compile example programs with this - you'll find
> that the DW_TAG_base_type for "int" has a child for every member function
> declaration that's defined (even used inline functions) in this translation
> unit) for this particular test, at least. Clang would be a larger/more
> representative sample.
>
> I confirmed that both with and without types, there were the same number
> (48542) of subprogram definitions and without types there were no instances
> of DICompositeType (both of these were confirmed with xargs/llvm-dis/grep,
> nothing fancy)
>
>
>
>
>>
>> And yes, it also trivially holds for C.
>>
>>
>>> Not much discussion of data objects and code objects (other than
>>> concrete subprograms), is that because they basically aren't changing?
>>> Still defined in the metadata and still managed/emitted by the back-end?
>>>
>>
>> Yep. A way of looking at it is more that it is related to things in the
>> IR and so needs IR to represent it.
>>
>>
>>> Please say something about types (which you're thinking of as a
>>> front-end thing) defined within scopes (which it looks like you're thinking
>>> of as a back-end thing).  Not seeing how to get the scoping right.
>>>
>>>
>>>
>>
>> Basic idea is non-defining declarations holding types and be the abstract
>> origin for the concrete function? Honestly, I wish they were type unitable
>> at the moment, but that might be something to look into. The current plan
>> at least. This will make some debug info a little bit larger, but only for
>> things like nested types where we need to throw an extra declaration (i.e.
>> the same sorts of places that type units make things larger).
>>
>> At any rate, the first thing is to get the APIs split anyhow.
>>
>> -eric
>>
>>
>>> Thanks!
>>>
>>> --paulr
>>>
>>>
>>>
>>> *From:* cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf Of *Eric
>>> Christopher via cfe-dev
>>> *Sent:* Tuesday, March 29, 2016 6:01 PM
>>> *To:* Clang Dev; llvm-dev
>>> *Subject:* [cfe-dev] RFC: Up front type information generation in clang
>>> and llvm
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> This is something that's been talked about for some time and it's
>>> probably time to propose it.
>>>
>>>
>>>
>>> The "We" in this document is everyone on the cc line plus me.
>>>
>>>
>>>
>>> Please go ahead and take a look.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -eric
>>>
>>>
>>>
>>>
>>>
>>> Objective (and TL;DR)
>>>
>>> =================
>>>
>>>
>>>
>>> Migrate debug type information generation from the backends to the front
>>> end.
>>>
>>>
>>>
>>> This will enable:
>>>
>>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
>>> know about C preprocessor macros, Obj-C properties, or extensive details
>>> about debug information binary formats.
>>>
>>> 2. Performance: Skipping a serialization should speed up normal
>>> compilations.
>>>
>>> 3. Memory usage: The DI metadata structures are smaller than they were,
>>> but are still fairly large and pointer heavy.
>>>
>>>
>>>
>>> Motivation
>>>
>>> ========
>>>
>>>
>>>
>>> Currently, types in LLVM debug info are described by the DIType class
>>> hierarchy. This hierarchy evolved organically from a more flexible
>>> sea-of-nodes representation into what it is today - a large, only somewhat
>>> format neutral representation of debug types. Making this more format
>>> neutral will only increase the memory use - and for no reason as type
>>> information is static (or nearly so). Debug formats already have a memory
>>> efficient serialization, their own binary format so we should support a
>>> front end emitting type information with sufficient representation to allow
>>> the backend to emit debug information based on the more normal IR features:
>>> functions, scopes, variables, etc.
>>>
>>>
>>>
>>> Scope/Impact
>>>
>>> ===========
>>>
>>>
>>>
>>> This is going to involve large scale changes across both LLVM and clang.
>>> This will also affect any out-of-tree front ends, however, we expect the
>>> impact to be on the order of a large API change rather than needing massive
>>> infrastructure changes.
>>>
>>>
>>>
>>> Related work
>>>
>>> ==========
>>>
>>>
>>>
>>> This is related to the efforts to support CodeView in LLVM and clang as
>>> well as efforts to reduce overall memory consumption when compiling with
>>> debug information enabled;  in particular efforts to prune LTO memory usage.
>>>
>>>
>>>
>>>
>>>
>>> Concerns
>>>
>>> ========
>>>
>>>
>>>
>>>
>>>
>>> We need a good story for transitioning all the debug info testcases in
>>> the backend without giving up coverage and/or readability. David believes
>>> he has a plan here.
>>>
>>>
>>>
>>> Proposal
>>>
>>> =======
>>>
>>>
>>>
>>> Short version
>>>
>>> -----------------
>>>
>>>
>>>
>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
>>> Table.
>>>
>>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>>>
>>> 3. Add a LLVM DWARF emission library similar to the existing CodeView
>>> one.
>>>
>>> 4. Migrate the Types API into a clang internal API taking clang AST
>>> structures and use the LLVM binary emission libraries to produce type
>>> information.
>>>
>>> 5. Remove the old binary emission out of LLVM.
>>>
>>>
>>>
>>>
>>>
>>> Questions/Thoughts/Elaboration
>>>
>>> -------------------------------------------
>>>
>>>
>>>
>>> Splitting the DIBuilder API
>>>
>>> ~~~~~~~~~~~~~~~~~~~~
>>>
>>> Will DISubprogram be part of both?
>>>
>>>    * We should split it in two: Full declarations with type and a
>>> slimmed down version with an abstract origin.
>>>
>>>
>>>
>>> How will we reference types in the DWARF blob?
>>>
>>>    * ODR types can be referenced by name
>>>
>>>    * Non-odr types by full DWARF hash
>>>
>>>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
>>> blob.
>>>
>>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
>>> Unit and use references and module relocations for the offsets. (See below)
>>>
>>>
>>>
>>> How will we handle references in DWARF2 or global relocations for
>>> non-type template parameters?
>>>
>>>    * We can use a “relocation” metadata as part of the format.
>>>
>>>    * Representable as a tuple that has the DIType and the offset within
>>> the DIBlob as where to write the final relocation/offset for the reference
>>> at emission time.
>>>
>>>
>>>
>>> Why break up the types at all?
>>>
>>>    * To enable non-debug format aware linking and type uniquing for LTO
>>> that won’t be huge in size. We break up the types so we don’t need to parse
>>> debug information to link two modules together efficiently.
>>>
>>>
>>>
>>> Any other concerns there?
>>>
>>>    * Debug information without type units might be slightly larger in
>>> this scheme due to parents being duplicated (declarations and abstract
>>> origin, not full parents). It may be possible to extend dsymutil/etc to
>>> merge all siblings into a common parent. Open question for better ways to
>>> solve this.
>>>
>>>
>>>
>>> How should we handle DWARF5/Apple Accelerator Tables?
>>>
>>>    * Thoughts:
>>>
>>>    * We can parse the dwarf in the back end and generate them.
>>>
>>>    * We can emit in the front end for the base case of non-LTO (with
>>> help from the backend for relocation aspects).
>>>
>>>    * We can use dsymutil on LTO debug information to generate them.
>>>
>>>
>>>
>>> Why isn’t this a more detailed spec?
>>>
>>>    * Mostly because we’ve thought about the issues, but we can’t plan
>>> for everything during implementation.
>>>
>>>
>>>
>>>
>>>
>>> Future work
>>>
>>> ----------------
>>>
>>>
>>>
>>> Not contained as part of this, but an obvious future direction is that
>>> the Module linker could grow support for debug aware linking. Then we can
>>> have all of the type information for a single translation unit in a single
>>> blob and use the debug aware linking to handle merging types.
>>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
> <notypes.diff>_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/571c878f/attachment.html>


More information about the llvm-dev mailing list