[PATCH] Have clang list the imported modules in the debug info
Adrian Prantl via cfe-commits
cfe-commits at lists.llvm.org
Mon Aug 24 13:23:42 PDT 2015
> On Aug 19, 2015, at 1:20 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Mon, Aug 10, 2015 at 5:00 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>
>> On Jul 24, 2015, at 12:33 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>
>> *reads back through the thread*
>
> appreciated, it’s long :-)
>
>> So what I originally had in mind about a year ago when we discussed this, was that the module data could have an extra table from type hash to whatever useful internal representation to find the type in the PCM.
>
> It turned out that the most useful internal type representation to find a type in a PCM is the type’s DeclContext+Name; this is how (surprise!) clang looks up types in a PCM and the format is supposed to be fast for these kind of lookups.
>
> Still, I would imagine there would be some kind of direct access (the offset in the file, or somesuch) rather than actually having to go through hashtables, etc. No? (how does one module refer to types in another module? Really by name?)
Entities in PCMs have local integer IDs (just consecutive numbers) that are used to encode references inside a record on disk. An external reference gets a global ID which is the local ID + the ID of the other module. This numbering scheme of course only makes sense within a module (and perhaps also within a chained PCH). Every (local and global) ID maps to an entry in the PCM’s identifier table. When deserializing, the in-memory global IdentifierTable is built and each module gets a map that remaps its internal global IDs to the “global” global IDs.
For debug info these IDs are not very useful, because they are not resilient against even the smallest additive change to the module. We could add an integer attribute with the module-internal entity ID to the forward declaration that the debugger can use if the module hash and the CU’s dwoid are matching, but I’m not yet convinced that it would be worth it. Adding it to the definition of the type in the module dwarf seems not worth it because then we already have to do a similarly expensive lookup to find the module containing the definition.
>
>> Everything else would just be DWARF with type units and fission (with the slight wrinkle of type units that aren't resolvable within a single object file - they could reference cross-object/dwo file) - emitting a fission CU for each referenced module.
>>
>> Needing modules to disambiguate/avoid collisions/support non-odr languages wasn't something I understood/had considered back then. That explains the need to add module references to the CU, so the debugger can know which modules to search for the types in (& doesn't just go loading all of them, etc).
>>
>> I would still picture this as "normal type units + a table in the module to resolve types", but if you guys particularly like using the mangled string name (or other identifier) in the DWARF that may avoid the need for an intermediate table (but it doesn't sound like you are avoiding an intermediate table - you said something about having an accelerator-table-like thing to aid in the DWARF->AST mapping? So could that be key'd of the type hash/signature we're using, thus keeping the DWARF more plain/vanilla DWARF5 (type units + fission)?)
>
> I originally disliked type signatures and favored using mangled names because the mangled names contained the DeclContext necessary to find types in the PCM. But if we can squeeze the DeclContext somewhere else, that’s fine.
>
> From the discussion we had on FlagExternalTypeRef I got the impression that long-form forward declarations are starting to look more attractive: If every external type reference is a reference to a forward declaration that has a complete decl context,
>
> While that's conveniently what we output currently, I'm not sure it's a great idea to rely on it. We might one day optimize type references (& we'll certainly need to optimize non-type references like member functions, etc - since emitting a stub for those would start to, more visibly, reduce the benefit of doing this work in the first place, I would imagine) so that when there's no contents (which will be more common once we can reference members directly with Bag O DWARF) we just have a DW_AT_type encoded as a DW_FORM_ref_sig8 directly.
Emitting a ref_sig8 directly only works in C++, where the ODR guarantees that each signature is globally unique. A consumer would have to search for the type definition in all modules (rather than finding it via the decl context), which is less efficient, but still works because of the ODR. Thus we’re only depending on the long-form forward declarations in non-ODR languages, where we need the decl context to disambiguate between non-unique signatures.
>
> with a DW_TAG_module at the root of the decl context chain,
>
> This ^ is something I didn't have in mind and would complicate things somewhat. I'd really like to keep things as close to the existing standard type unit + split dwarf standards as possible except where necessary to do otherwise.
In what way does it complicate things?
>
> and a DW_AT_name+DW_AT_signature at the other end, we would have all the information we need without introducing any further LLVM-specific DWARF extensions. To look up an external type from the PCM, the consumer imports the DW_TAG_module and deserializes the type found by declcontext+name. To load the type from DWARF, the consumer grabs the signature from the forward declaration and magically (1) finds the PCM and looks up the type by signature (2).
>
> (1) My suggestion is to extend LLVM so it can put the DW_TAG_module with the forward declaration inside the skeleton compile unit (which has the path to the PCM and its DWOid).
> (2) On ELF with type units this works out of the box,
>
> Not necessarily - the use of DW_TAG_modules in the scope chain might confuse/break things. It's pretty unprecedented/non-standard, I would think?
It’s use for C++ is unprecedented, but the tag itself very standard.
>
> on MachO without type units we need some kind of index mapping signature -> DIE (bag of DWARF style?).
>
> I was rather hoping you guys would implement type units (since they'll be a step towards Bag O DWARF anyway) on MachO... - at least for this case. They wouldn't have to be COMDAt'd or each in their own section, they'd just be in a debug_types section one after the other in the module .o file.
How does a consumer today find a type unit for a given signature? Does it build its own index based on the signatures of the COMDAT sections? DWP files define a .debug_tu_index accelerator section to this end, but how is this normally handled?
> Assuming that many external types will share a similar DeclContext prefix I am not very worried by the space needed to store the long forward references. Compared to storing the mangled name for every type
>
> What I was picturing wasn't to storet the mangled name anywhere - but to have, in the module object file somewhere a table from type hash to <useful way of accessing a type in a module, which I hope is just a byte offset or something similary cheap, small, and fast - not a table lookup, etc, but whatever the table lookup has as its values already>.
With the decl context and the TAG_module on top a consumer knows which module to import (fast) and can do the import by name (hash lookup). With the table in the module object and no TAG_module in the decl context, it has to scan all modules for the type signature (multiple hash lookups) and can directly import the type (fast).
-- adrian
>
> it will often actually take up less space. Also, on Darwin at least, llvm-dsymutil can strip them out of the way after resolving the external type references.
>
> -- adrian
>>
>> - Dave
>>
>> On Wed, May 6, 2015 at 3:43 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>
>>> On May 6, 2015, at 3:24 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>
>>>
>>>
>>> On Wed, May 6, 2015 at 3:15 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>
>>>> On May 6, 2015, at 2:52 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, May 6, 2015 at 2:45 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>>
>>>>> On May 6, 2015, at 2:35 PM, Eric Christopher <echristo at gmail.com <mailto:echristo at gmail.com>> wrote:
>>>>>
>>>>> That said, add enough to the name for hashing purposes to make it hash uniquely? Or you can go down the path of hashing the type similar to the fission CU hashing (which is what type units were arguably designed to do in the first place if you take a look at the standard, we just only use them for ODR compliant languages etc right now).
>>>>
>>>> I suppose one could hash the entire module configuration + the mangled name and get something that is relatively stable.
>>>> For implementation reasons it would be terrible to do the full fission hashing because that would mean that we would actually have to look up (and deserialize the type) in order to get to its ID when emitting an external type reference, which would void at least some of the performance gains we want from module debugging.
>>>>
>>>> I thought you were proposing using the mangled name of the type for the identifier anyway? Perhaps I misunderstood - what are you proposing to use? In any case, I'd prefer to see whatever it is hashed and used as the type unit signature for compatibility with DWARF5, rather than adding an extra/separate/new/non-standard way to do cross-unit/cross-fission type references.
>>>
>>> In the IR I’d /like/ to have a DIExternalTypeRef(DW_TAG_class_type, !”_ZTC6TypeName”, !1) with !1 being a reference to either the DIModule or the skeleton CU. Then the backend would emit the hash of the name if type units are enabled (C++/gdb) or the mangled name (+ the accelerator table entry) otherwise (ObjC and/or Darwin). If there is significant pushback to the latter, I’d be willing to have the backend emit a hash in both cases but we’d have to careful about what to exactly to hash for all the aforementioned reasons.
>>>
>>> I don't follow - if the mangled name is sufficient, then a hash of the mangled name should be.... what am I missing?
>>>
>>
>> Nothing, these are two separate issues:
>>
>>> If you don't have an ODR to rely on, then a mangled name seems insufficient just as the hash would be.
>>
>> The decision for mangled name vs hash is motivated by the mangled name also doubling as a key to look up the type in the AST.
>> The other problem is (partially) solved by the accelerator table entry that associates the mangled name with a module. I’m starting to think now that it might be better to include a fission-style forward declaration + decl context into the TAG_module instead. The DWARF-style decl context could in theory be smaller than the mangled name because two types could share common ancestors and then we could emit the same hash of the mangled names as we do for type units. But let’s discuss this when it comes up (together with the patch that makes use of DIExternalTypeRef).
>>
>> -- adrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150824/1a975718/attachment-0001.html>
More information about the cfe-commits
mailing list