[PATCH] Have clang list the imported modules in the debug info

David Blaikie via cfe-commits cfe-commits at lists.llvm.org
Mon Aug 24 17:54:13 PDT 2015


On Mon, Aug 24, 2015 at 5:33 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Aug 24, 2015, at 4:17 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Mon, Aug 24, 2015 at 3:34 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Aug 24, 2015, at 2:01 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Mon, Aug 24, 2015 at 1:23 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> On Aug 19, 2015, at 1:20 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Mon, Aug 10, 2015 at 5:00 PM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>>
>>>>
>>>> On Jul 24, 2015, at 12:33 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>> *reads back through the thread*
>>>>
>>>>
>>>> appreciated, it’s long :-)
>>>>
>>>> So what I originally had in mind about a year ago when we discussed
>>>> this, was that the module data could have an extra table from type hash to
>>>> whatever useful internal representation to find the type in the PCM.
>>>>
>>>>
>>>> It turned out that the most useful internal type representation to find
>>>> a type in a PCM is the type’s DeclContext+Name; this is how (surprise!)
>>>> clang looks up types in a PCM and the format is supposed to be fast for
>>>> these kind of lookups.
>>>>
>>>
>>> Still, I would imagine there would be some kind of direct access (the
>>> offset in the file, or somesuch) rather than actually having to go through
>>> hashtables, etc. No? (how does one module refer to types in another module?
>>> Really by name?)
>>>
>>>
>>> Entities in PCMs have local integer IDs (just consecutive numbers) that
>>> are used to encode references inside a record on disk. An external
>>> reference gets a global ID which is the local ID + the ID of the other
>>> module. This numbering scheme of course only makes sense within a module
>>> (and perhaps also within a chained PCH). Every (local and global) ID maps
>>> to an entry in the PCM’s identifier table. When deserializing, the
>>> in-memory global IdentifierTable is built and each module gets a map that
>>> remaps its internal global IDs to the “global” global IDs.
>>>
>>> For debug info these IDs are not very useful, because they are not
>>> resilient against even the smallest additive change to the module. We could
>>> add an integer attribute with the module-internal entity ID to the forward
>>> declaration that the debugger can use if the module hash and the CU’s dwoid
>>> are matching, but I’m not yet convinced that it would be worth it. Adding
>>> it to the definition of the type in the module dwarf seems not worth it
>>> because then we already have to do a similarly expensive lookup to find the
>>> module containing the definition.
>>>
>>
>> It seems that ideally a module-aware debugger would not consult the DWARF
>> at all, so I wouldn't advocate having the module format type ID in the
>> DWARF, but in a side table (either part of the module itself, or just a
>> section of its own in the module-as-object-file).
>>
>>
>> Do you mean a side-table in the module that maps sig8 -> ID?
>>
>
> Yes, this is what I tried to articulate way-back-when we had the original
> discussion in person at Google. That module debug info is Dwarf type units
> (later bag-o-dwarf) + fission, plus a side table to make DWARF type
> identifiers to module identifiers.
>
> Certainly there are some wrinkles to that (possible type ambiguities and
> module conflicts), but I'm really trying to stick to that original concept
> and understand/investigate any deviation from it. (& more generally, any
> deviation from existing DWARF practices in the field - while extensions are
> totally reasonable, I want to really highlight when we're stepping beyond
> existing practice and carefully choose how we do that rather than just
> picking a representation and going with it - DWARF doesn't provide a lot of
> guidance and we can pick essentially arbitrary DWARF to mean near arbitrary
> things, and that'll essentially be a contract between Clang and debuggers
> when we do so, so I'm inclined to be rather careful when making those
> choices (because we'll eventually have to convince other debuggers to
> implement those things, etc))
>
>
>> If we need to perform hash lookups to get to the ID that we might as well
>> look up the type by name in the module, because that’s also a hash table
>> lookup. I don’t think a performance argument can be made for using IDs.
>>
>
> Having to hash the whole mangled name (let alone piecing it together from
> the names in the scope chain, walking the DIE parent chain, etc) is going
> to be more expensive than using the existing sig8.
>
>
> That’s not what I had in mind, let me clarify how I imagined this: A dwarf
> consumer follows the reference to the forward declaration reads the
> AT_signature looks up the type definition by signature and reconstructs the
> type from the definition. An ast consumer also follows the reference to the
> forward declaration but goes up the scope chain finds the TAG_module,
> imports it, and then uses sema to load the type by name thus causing clang
> to deserialize the type without getting their hands dirty with module
> internals.
>

That does seem expensive to me - to do name lookup, etc.


>
>
>>
>> Why is it important that the IDs be resilient to changes in the module?
>>
>>
>> Several reasons I can think of:
>> - developers are used to being able to debug a binary with slightly
>> out-of-date source code.
>>
>
> I'm confused - the debug info and the module are built together, and the
> user's program is built from that.
>
> You're trying to support the case where the user's binary is built but
> then the module is rebuilt with a minor change? I think that's a fairly
> losing battle & one I'd be pretty concerned about attempting to support.
>
>
>> - if we use IDs we need to make the dwoid==modulehash check mandatory or
>> we resolve random types, but that means that even adding whitespace to a
>> header file means that module debugging won’t work any more.
>>
>
> Yes, generally debug info is invalidated by changes to the source. Changes
> to whitespace not related to line wrapping could be supported, it's a
> really narrow slice - as soon as something changes lines you're out of
> luck. What % of cases do you think this would be successful for? (& if you
> only provide a warning in the cases where they don't match - that seems
> likely to result in some major user confusion when their lines are off, etc
> (well, I suppose that's true today if they edit the source and don't
> rebuild, but still - debuggers tend to warn about that too))
>
>
> That’s true.
>
>
>
>> - by using IDs we would be baking internal details about the clang module
>> format into the debug info:
>>
>
> It wouldn't be in the debug info (like I said, I don't expect a
> module-aware debugger should be reading any DWARF in the module) but a side
> table to go with the module data.
>
>
>>   - what if the clang module format changes?
>>
>
> Then the side table format would change as well, I would imagine.
>
>
>>   - a non-clang compiler+debugger could implement our debug info format
>> with a different AST serialization scheme.
>>
>
> And a different hash->entity mapping.
>
>
> Okay. One could see this as an (optional) accelerator table for consumers
> willing to manually deserialize decls (rather than just importing the
> module having sema sort it out)?
>

Sort of optional, but sort of not - if it's optional then you constrain the
DWARF used to describe the type (it's not enough to simply use DW_AT_type
with a ref_sig8, for example - which is a totally reasonable way to
reference a type unit in DWARF - instead you must emit the type declaration
and you must emit that declaration in a DW_AT_module, etc). That seems like
an unfortunate tradeoff especially as we go forward trying to make debug
info smaller (emitting the context for every type into every module that
references the type would be sometihng we might like to avoid).


>
>
>
>>
>>
>> I'm suggesting that non-module debug info should only use type hashes,
>> then within the module itself there would be a hash-to-ID mapping (a flat
>> table, probably, wouldn't need to be anything fancy).
>>
>> This would keep the DWARF less polluted by module concepts - it'd just be
>> up to the consumer "oh, here's a type hash identifier, resolve that to an
>> AST - either by looking in DWARF and building an AST from the DWARF type
>> unit with the matching hash, or by looking in the module and finding the
>> ID/loading the type”?
>>
>>
>> This is glossing over the fact that more metadata is needed to find and
>> import the right version of the module. Metadata that is stored in the
>> DW_TAG_module. Where is the module to be found (E.g., on Darwin the sysroot
>> tells us whether to look in an OSX or an iOS SDK), which configuration of
>> the module do we need to import (is -DNDEBUG enabled?), and the include
>> path tells us where to find the header file if the module cache is out of
>> date or deleted.
>>
>> Ok there are really three issues here:
>> Should module types be looked up by ID or by name?
>> Should module type forward declarations be underneath the DW_TAG_module
>> and/or the skeleton CU?
>> And, can we get away with just emitting ref_sig8s for ODR languages?
>>
>
>
> These are some of the issues we've been touching on, yes.
>
>>
>>
>>>
>>>
>>>> Everything else would just be DWARF with type units and fission (with
>>>> the slight wrinkle of type units that aren't resolvable within a single
>>>> object file - they could reference cross-object/dwo file) - emitting a
>>>> fission CU for each referenced module.
>>>>
>>>> Needing modules to disambiguate/avoid collisions/support non-odr
>>>> languages wasn't something I understood/had considered back then. That
>>>> explains the need to add module references to the CU, so the debugger can
>>>> know which modules to search for the types in (& doesn't just go loading
>>>> all of them, etc).
>>>>
>>>> I would still picture this as "normal type units + a table in the
>>>> module to resolve types", but if you guys particularly like using the
>>>> mangled string name (or other identifier) in the DWARF that may avoid the
>>>> need for an intermediate table (but it doesn't sound like you are avoiding
>>>> an intermediate table - you said something about having an
>>>> accelerator-table-like thing to aid in the DWARF->AST mapping? So could
>>>> that be key'd of the type hash/signature we're using, thus keeping the
>>>> DWARF more plain/vanilla DWARF5 (type units + fission)?)
>>>>
>>>>
>>>> I originally disliked type signatures and favored using mangled names
>>>> because the mangled names contained the DeclContext necessary to find types
>>>> in the PCM. But if we can squeeze the DeclContext somewhere else, that’s
>>>> fine.
>>>>
>>>> From the discussion we had on FlagExternalTypeRef I got the impression
>>>> that long-form forward declarations are starting to look more attractive:
>>>> If every external type reference is a reference to a forward declaration
>>>> that has a complete decl context,
>>>>
>>>
>>> While that's conveniently what we output currently, I'm not sure it's a
>>> great idea to rely on it. We might one day optimize type references (&
>>> we'll certainly need to optimize non-type references like member functions,
>>> etc - since emitting a stub for those would start to, more visibly, reduce
>>> the benefit of doing this work in the first place, I would imagine) so that
>>> when there's no contents (which will be more common once we can reference
>>> members directly with Bag O DWARF) we just have a DW_AT_type encoded as a
>>> DW_FORM_ref_sig8 directly.
>>>
>>>
>>> Emitting a ref_sig8 directly only works in C++, where the ODR guarantees
>>> that each signature is globally unique.
>>>
>>
>> An identifier will be needed for types without the ODR too - DWARF
>> suggests hashing the type description (the actual DIE hierarchy) the same
>> as is done for Fission.
>>
>>
>> Using a DWARF type hash would be terrible, because we’d have to compile
>> all external types to DWARF to get to the hash, or have clang link against
>> libDWARF to read the hash from the module. Instead we’re using the
>> clangIndexer’s USRs as identifiers.
>>
>
> Right, I agree that using the DWARF type hash isn't really suitable here
> (in part I figured you'd need a hash that had more things in it - there are
> lots of things you can change in a type that the DWARF type unit doesn't
> have in it, so doesn't hash - but are important parts of the type as
> described by a Clang module (it's far more precise, of course - that's the
> benefit of using one)).
>
> What are ClangIndexer's USRs? How are they built? I take it they're a hash
> of the type in the Clang AST? That's pretty much what I had in mind.
>
>
> They are textual identifiers that syntactically almost look like C++
> mangled names. We can use them wherever we use C++ mangled names.
>

Oh. Well, then, yes, you can use them in the same way - use the string as
the typeref identifier, then let LLVM hash it & use it as a sig8 as it does
for type units already.


>
>
>
>>
>>
>>> A consumer would have to search for the type definition in all modules
>>> (rather than finding it via the decl context), which is less efficient, but
>>> still works because of the ODR. Thus we’re only depending on the long-form
>>> forward declarations in non-ODR languages, where we need the decl context
>>> to disambiguate between non-unique signatures.
>>>
>>
>> We shouldn't use the mangled name as the hash for types that don't have
>> an ODR. The current type units implementation does not create type units
>> for non-ODR types. (the same way we only do type refs for ODR types, the
>> current type units implementation piggy backs on exactly the same property
>> and for the same reasons)
>>
>>
>> I’m retracting my argument about having to disambiguate non-unique
>> signatures. While it is perfectly legal for two C modules to define a
>> conflicting type with the same name; Clang requires an ODR for modules (
>> http://clang.llvm.org/docs/Modules.html#module-declaration:
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__clang.llvm.org_docs_Modules.html-23module-2Ddeclaration-3A&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=cTx6f1tAfqPeajYunFWp7_8ot79RnHyNteqzig4fXmA&m=NatRvEtJET9lJCyMi8oPzApNu9pecVHVYZemXISiMTE&s=TRLQGkF8CTF3AsCZd2EUMgPJ81XONOQgRsiXuhl2ggg&e=>
>>  "Each module shall have a single definition.”). If we encode the module
>> name in the USR (that’s a todo), we’ll end up with unique identifiers for
>> all types.
>>
>
> OK, so you're suggesting the ID/hash you'd use for C types would include
> the module and thus be a globally unique identifier much like the type hash
> for C++ types? Sounds good to me.
>
> Yes.
>
>
>> Searching all .dwos for DWARF type units is an existing problem for pure
>> DWARF type units with Fission - it might be worth considering (talking to
>> the committee) what would be the right solution for DWARF type
>> units+Fission and then seeing if that expands to module debug info.
>>
>>
>>>
>>>
>>> with a DW_TAG_module at the root of the decl context chain,
>>>>
>>>
>>> This ^ is something I didn't have in mind and would complicate things
>>> somewhat. I'd really like to keep things as close to the existing standard
>>> type unit + split dwarf standards as possible except where necessary to do
>>> otherwise.
>>>
>>>
>>> In what way does it complicate things?
>>>
>>
>> Debuggers probably aren't setup to handle types inside modules in this
>> way (at least for C++) - I've never seen this construct & imagine debuggers
>> haven't either & may have some trouble with it (how to name it/make it
>> nameable by the user, etc - maybe they'd end up putting a module name
>> prefix on it or something that would be unhelpful?)
>>
>>
> It would be nice to test this out with, e.g., GDB to see if it actually is
> confused by this.
>
>
>> I'd really like to stick to known/existing DWARF as much as possible &
>> consider very carefully anywhere we diverge from that.
>>
>>
>> I think we should stick to the standard as much as possible and describe
>> the language as faithfully as possible in DWARF, which means that if types
>> are defined in a C++ module, they should show up inside of a module;
>> debuggers have to make some changes anyway in order to support all of this,
>>
>
> But that's partly my point: I don't think they do. For the non-modular
> side of this, I believe it could be used in GDB today using only fission
> and type units... are there parts you believe cannot/do not/would not work
> there?
>
>
>> so we might as well get the format right.
>> It would be elegant if we could use the TAG_module (and/or the skeleton
>> CU) at the top of the DeclCtx chain to identify the module that contains
>> the definition instead of having to search through all modules that were
>> imported by the current CU. It’s not the end of the world if we can’t do
>> it, but it does mean slower lookup times for both DWARF and AST consumers.
>>
>
> I too, would like a fast lookup mechanism here - assuming the current
> state of the world (fission + type units, without module debug info or
> anything) is already suffering from this performance problem, I'd be
> inclined to look for a solution to that situation first (in the DWARF
> committee), if it is a problem (which, to the best of my understanding, it
> must be a problem - I don't know of any way the debugger could know which
> .dwo to find a type unit in short of searching all of them). Then see how
> that might be applicable to module debug info.
>
> Creating a solution for module debug info (by putting the type
> declarations in the module) leaves the existing, shipped feature of DWARF,
> short of something it clearly needs just as much as modules would.
>
> Actually, I suppose this doesn't quite come up for fission+type units
> because in a normal implementation, type units would only be emitted into
> the .dwos that need them, so you can always look locally. (& once you've
> created a .dwp it's all together anyway)
>
> This would only come up with a small change to Clang's output - Clang
> could use ref_sig8 across modules, rather than emitting declarations. It's
> perhaps not clear that this violates DWARF, but (much like the proposed
> module debug info) might surprise debuggers enough that they wouldn't know
> where to find the types. I haven't tried prototyping this.
>
> (this is perhaps the changes necessary for a debugger to support module
> debug info?)
>
>
> Yes. My assumption was that current fission consumers do not generally
> know how to look up a ref_sig8 when there is more than one skeleton cu.
>

It's a fair question - possibly not, I'm not sure what GDB does.

(though debuggers already need to be able to search for type definitions to
match to declarations via name - one of the reasons I was thinking we could
use type units more even without module debug info - emitting sig8 refs
even in object files that don't contain the type unit (this is easily
correct with normal linked debug info without fission, because the type
would be in the final executable anyway - but probably underspecified in
the presence of fission+type units), still seems like a small-ish extension
- with, as discussed, a possible perf improvement by providing some kind of
sig8 list for each CU so the debugger could know where to go for types (and
this should be no worse than just emitting a type declaration where we do
today - and possibly better, because it wouldn't be name based and you
could tell by just scanning type unit headers even without any helper list))


>
>
>
> and a DW_AT_name+DW_AT_signature at the other end, we would have all the
>>>> information we need without introducing any further LLVM-specific DWARF
>>>> extensions. To look up an external type from the PCM, the consumer imports
>>>> the DW_TAG_module and deserializes the type found by declcontext+name. To
>>>> load the type from DWARF, the consumer grabs the signature from the forward
>>>> declaration and magically (1) finds the PCM and looks up the type by
>>>> signature (2).
>>>>
>>>> (1) My suggestion is to extend LLVM so it can put the DW_TAG_module
>>>> with the forward declaration inside the skeleton compile unit (which has
>>>> the path to the PCM and its DWOid).
>>>> (2) On ELF with type units this works out of the box,
>>>>
>>>
>>> Not necessarily - the use of DW_TAG_modules in the scope chain might
>>> confuse/break things. It's pretty unprecedented/non-standard, I would think?
>>>
>>>
>>> It’s use for C++ is unprecedented, but the tag itself very standard.
>>>
>>
>> Sure - I get that the tag is standard, but what it means (& especially
>> what it means to have types inside it) in the context of C++ debugging is
>> something I'm fairly concerned about.
>>
>>> on MachO without type units we need some kind of index mapping signature
>>>> -> DIE (bag of DWARF style?).
>>>>
>>>
>>> I was rather hoping you guys would implement type units (since they'll
>>> be a step towards Bag O DWARF anyway) on MachO... - at least for this case.
>>> They wouldn't have to be COMDAt'd or each in their own section, they'd just
>>> be in a debug_types section one after the other in the module .o file.
>>>
>>>
>>> How does a consumer today find a type unit for a given signature? Does
>>> it build its own index based on the signatures of the COMDAT sections? DWP
>>> files define a .debug_tu_index accelerator section to this end, but how is
>>> this normally handled?
>>>
>>
>> .dwo files have no COMDAT sections - you just put all the type units in
>> them directly.
>>
>> In any case, that's still going to be a perf hit, because you'll at least
>> have to read some part of every .dwo file to see which types are in it.
>>
>> I would believe/assume/imagine this is a performance problem for Type
>> Units + Fission as they are defined today (as I've alluded to above) & may
>> be worth considering general DWARF solutions (talk with the committee sort
>> of stuff) that may generalize to module debug info too. It's also possible
>> I've missed something and there are existing solutions to the Type Units +
>> Fission performance problem.
>>
>>
>> How pervasive is the use of .dwp package files? It sounds like it would
>> be a solution designed for this.
>>
>
> Not sure.
>
> For Google, dwps are only used for archive (much like MacOS's dsyms, if I
> understand correctly). For iterative development it's more expensive to
> build the dwp than to just load the debug info on the fly from .dwos as
> needed.
>
>
> Yes, that sounds a lot like dsym bundle usage patterns.
> What do you think about adding a .debug_cu_index section to the module
> dwarf to make the development version just as fast?
>

I haven't looked at the cu_index feature in DWARF 5. A quick googling
doesn't locate a draft version (I guess they're not public) but does hint
at (looking at some of Cary's GCC patch notes) a tu_index which is probably
more relevant, but I still know next to nothing about its format, etc.

Having the index in the module would probably still be a bit sub-optimal
for Google's usage, because we have objects (including .dwos) on a
networked filesystem. The ability to avoid even touching the .dwos would
speed up our debugger scenarios - but there's a bunch of work Cary was
planning to do before he left that we never got around to to play with the
idea of creating something like a thin DWP file that would be generated at
link time, potentially stripped from the object file - so it wouldn't cost
us executable size, but still could contain efficient indexes, etc.

My point being, I wouldn't necessarily want to prioritize our abstract
needs when we haven't prioritized our concrete needs enough to implement
some of the build workflow pieces that would be relevant to putting these
indicies in the object files and then efficiently removing them later (or
putting them in the .dwo files but pulling them out into a meta-index
later, etc).

To come back to your original question: I'm not sure how much faster it
would make the developer scenario, because you'd still need to visit/read
from each .dwo file. Not as much, mind you, but it might still be enough to
dominate compared to the benefit of not having to parse the DWARF. I'm not
sure.

- Dave


>
> -- adrian
>
>
>
>>
>>
>>
>>>
>>>
>>> Assuming that many external types will share a similar DeclContext
>>>> prefix I am not very worried by the space needed to store the long forward
>>>> references. Compared to storing the mangled name for every type
>>>>
>>>
>>> What I was picturing wasn't to storet the mangled name anywhere - but to
>>> have, in the module object file somewhere a table from type hash to <useful
>>> way of accessing a type in a module, which I hope is just a byte offset or
>>> something similary cheap, small, and fast - not a table lookup, etc, but
>>> whatever the table lookup has as its values already>.
>>>
>>>
>>> With the decl context and the TAG_module on top a consumer knows which
>>> module to import (fast) and can do the import by name (hash lookup). With
>>> the table in the module object and no TAG_module in the decl context, it
>>> has to scan all modules for the type signature (multiple hash lookups) and
>>> can directly import the type (fast).
>>>
>>> -- adrian
>>>
>>>
>>>
>>>> it will often actually take up less space. Also, on Darwin at least,
>>>> llvm-dsymutil can strip them out of the way after resolving the external
>>>> type references.
>>>>
>>>> -- adrian
>>>>
>>>>
>>>> - Dave
>>>>
>>>> On Wed, May 6, 2015 at 3:43 PM, Adrian Prantl <aprantl at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On May 6, 2015, at 3:24 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 6, 2015 at 3:15 PM, Adrian Prantl <aprantl at apple.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On May 6, 2015, at 2:52 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 6, 2015 at 2:45 PM, Adrian Prantl <aprantl at apple.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On May 6, 2015, at 2:35 PM, Eric Christopher <echristo at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> That said, add enough to the name for hashing purposes to make it
>>>>>>> hash uniquely? Or you can go down the path of hashing the type similar to
>>>>>>> the fission CU hashing (which is what type units were arguably designed to
>>>>>>> do in the first place if you take a look at the standard, we just only use
>>>>>>> them for ODR compliant languages etc right now).
>>>>>>>
>>>>>>>
>>>>>>> I suppose one could hash the entire module configuration + the
>>>>>>> mangled name and get something that is relatively stable.
>>>>>>> For implementation reasons it would be terrible to do the full
>>>>>>> fission hashing because that would mean that we would actually have to look
>>>>>>> up (and deserialize the type) in order to get to its ID when emitting an
>>>>>>> external type reference, which would void at least some of the performance
>>>>>>> gains we want from module debugging.
>>>>>>>
>>>>>>
>>>>>> I thought you were proposing using the mangled name of the type for
>>>>>> the identifier anyway? Perhaps I misunderstood - what are you proposing to
>>>>>> use? In any case, I'd prefer to see whatever it is hashed and used as the
>>>>>> type unit signature for compatibility with DWARF5, rather than adding an
>>>>>> extra/separate/new/non-standard way to do cross-unit/cross-fission type
>>>>>> references.
>>>>>>
>>>>>>
>>>>>> In the IR I’d /like/ to have a DIExternalTypeRef(DW_TAG_class_type,
>>>>>> !”_ZTC6TypeName”, !1) with !1 being a reference to either the DIModule or
>>>>>> the skeleton CU. Then the backend would emit the hash of the name if type
>>>>>> units are enabled (C++/gdb) or the mangled name (+ the accelerator table
>>>>>> entry) otherwise (ObjC and/or Darwin). If there is significant pushback to
>>>>>> the latter, I’d be willing to have the backend emit a hash in both cases
>>>>>> but we’d have to careful about what to exactly to hash for all the
>>>>>> aforementioned reasons.
>>>>>>
>>>>>
>>>>> I don't follow - if the mangled name is sufficient, then a hash of the
>>>>> mangled name should be.... what am I missing?
>>>>>
>>>>> Nothing, these are two separate issues:
>>>>>
>>>>> If you don't have an ODR to rely on, then a mangled name seems
>>>>> insufficient just as the hash would be.
>>>>>
>>>>>
>>>>> The decision for mangled name vs hash is motivated by the mangled name
>>>>> also doubling as a key to look up the type in the AST.
>>>>> The other problem is (partially) solved by the accelerator table entry
>>>>> that associates the mangled name with a module. I’m starting to think now
>>>>> that it might be better to include a fission-style forward declaration +
>>>>> decl context into the TAG_module instead. The DWARF-style decl context
>>>>> could in theory be smaller than the mangled name because two types could
>>>>> share common ancestors and then we could emit the same hash of the mangled
>>>>> names as we do for type units. But let’s discuss this when it comes up
>>>>> (together with the patch that makes use of DIExternalTypeRef).
>>>>>
>>>>> -- adrian
>>>>>
>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150824/66c50180/attachment-0001.html>


More information about the cfe-commits mailing list