[PATCH] Have clang list the imported modules in the debug info

Mon Aug 24 15:26:14 PDT 2015

+static void escape(llvm::raw_ostream &OS, StringRef String) {
+  for (auto C: String) {
+    if (strchr("\\\"", C))
+      OS << '\\';
+    OS << C;
+  }
+}

Do we not have an existing function somewhere to do that? Also, that strchr
call is a very complex way of writing (C == '\\' || C == '"')

On Mon, Aug 24, 2015 at 2:01 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Mon, Aug 24, 2015 at 1:23 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Aug 19, 2015, at 1:20 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Mon, Aug 10, 2015 at 5:00 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> On Jul 24, 2015, at 12:33 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>> *reads back through the thread*
>>>
>>>
>>> appreciated, it’s long :-)
>>>
>>> So what I originally had in mind about a year ago when we discussed
>>> this, was that the module data could have an extra table from type hash to
>>> whatever useful internal representation to find the type in the PCM.
>>>
>>>
>>> It turned out that the most useful internal type representation to find
>>> a type in a PCM is the type’s DeclContext+Name; this is how (surprise!)
>>> clang looks up types in a PCM and the format is supposed to be fast for
>>> these kind of lookups.
>>>
>>
>> Still, I would imagine there would be some kind of direct access (the
>> offset in the file, or somesuch) rather than actually having to go through
>> hashtables, etc. No? (how does one module refer to types in another module?
>> Really by name?)
>>
>>
>> Entities in PCMs have local integer IDs (just consecutive numbers) that
>> are used to encode references inside a record on disk. An external
>> reference gets a global ID which is the local ID + the ID of the other
>> module. This numbering scheme of course only makes sense within a module
>> (and perhaps also within a chained PCH). Every (local and global) ID maps
>> to an entry in the PCM’s identifier table. When deserializing, the
>> in-memory global IdentifierTable is built and each module gets a map that
>> remaps its internal global IDs to the “global” global IDs.
>>
>> For debug info these IDs are not very useful, because they are not
>> resilient against even the smallest additive change to the module. We could
>> add an integer attribute with the module-internal entity ID to the forward
>> declaration that the debugger can use if the module hash and the CU’s dwoid
>> are matching, but I’m not yet convinced that it would be worth it. Adding
>> it to the definition of the type in the module dwarf seems not worth it
>> because then we already have to do a similarly expensive lookup to find the
>> module containing the definition.
>>
>
> It seems that ideally a module-aware debugger would not consult the DWARF
> at all, so I wouldn't advocate having the module format type ID in the
> DWARF, but in a side table (either part of the module itself, or just a
> section of its own in the module-as-object-file).
>
> Why is it important that the IDs be resilient to changes in the module?
> I'm suggesting that non-module debug info should only use type hashes, then
> within the module itself there would be a hash-to-ID mapping (a flat table,
> probably, wouldn't need to be anything fancy). This would keep the DWARF
> less polluted by module concepts - it'd just be up to the consumer "oh,
> here's a type hash identifier, resolve that to an AST - either by looking
> in DWARF and building an AST from the DWARF type unit with the matching
> hash, or by looking in the module and finding the ID/loading the type"?
>
>
>>
>>
>>> Everything else would just be DWARF with type units and fission (with
>>> the slight wrinkle of type units that aren't resolvable within a single
>>> object file - they could reference cross-object/dwo file) - emitting a
>>> fission CU for each referenced module.
>>>
>>> Needing modules to disambiguate/avoid collisions/support non-odr
>>> languages wasn't something I understood/had considered back then. That
>>> explains the need to add module references to the CU, so the debugger can
>>> know which modules to search for the types in (& doesn't just go loading
>>> all of them, etc).
>>>
>>> I would still picture this as "normal type units + a table in the module
>>> to resolve types", but if you guys particularly like using the mangled
>>> string name (or other identifier) in the DWARF that may avoid the need for
>>> an intermediate table (but it doesn't sound like you are avoiding an
>>> intermediate table - you said something about having an
>>> accelerator-table-like thing to aid in the DWARF->AST mapping? So could
>>> that be key'd of the type hash/signature we're using, thus keeping the
>>> DWARF more plain/vanilla DWARF5 (type units + fission)?)
>>>
>>>
>>> I originally disliked type signatures and favored using mangled names
>>> because the mangled names contained the DeclContext necessary to find types
>>> in the PCM. But if we can squeeze the DeclContext somewhere else, that’s
>>> fine.
>>>
>>> From the discussion we had on FlagExternalTypeRef I got the impression
>>> that long-form forward declarations are starting to look more attractive:
>>> If every external type reference is a reference to a forward declaration
>>> that has a complete decl context,
>>>
>>
>> While that's conveniently what we output currently, I'm not sure it's a
>> great idea to rely on it. We might one day optimize type references (&
>> we'll certainly need to optimize non-type references like member functions,
>> etc - since emitting a stub for those would start to, more visibly, reduce
>> the benefit of doing this work in the first place, I would imagine) so that
>> when there's no contents (which will be more common once we can reference
>> members directly with Bag O DWARF) we just have a DW_AT_type encoded as a
>> DW_FORM_ref_sig8 directly.
>>
>>
>> Emitting a ref_sig8 directly only works in C++, where the ODR guarantees
>> that each signature is globally unique.
>>
>
> An identifier will be needed for types without the ODR too - DWARF
> suggests hashing the type description (the actual DIE hierarchy) the same
> as is done for Fission.
>
>
>> A consumer would have to search for the type definition in all modules
>> (rather than finding it via the decl context), which is less efficient, but
>> still works because of the ODR. Thus we’re only depending on the long-form
>> forward declarations in non-ODR languages, where we need the decl context
>> to disambiguate between non-unique signatures.
>>
>
> We shouldn't use the mangled name as the hash for types that don't have an
> ODR. The current type units implementation does not create type units for
> non-ODR types. (the same way we only do type refs for ODR types, the
> current type units implementation piggy backs on exactly the same property
> and for the same reasons)
>
> Searching all .dwos for DWARF type units is an existing problem for pure
> DWARF type units with Fission - it might be worth considering (talking to
> the committee) what would be the right solution for DWARF type
> units+Fission and then seeing if that expands to module debug info.
>
>
>>
>>
>> with a DW_TAG_module at the root of the decl context chain,
>>>
>>
>> This ^ is something I didn't have in mind and would complicate things
>> somewhat. I'd really like to keep things as close to the existing standard
>> type unit + split dwarf standards as possible except where necessary to do
>> otherwise.
>>
>>
>> In what way does it complicate things?
>>
>
> Debuggers probably aren't setup to handle types inside modules in this way
> (at least for C++) - I've never seen this construct & imagine debuggers
> haven't either & may have some trouble with it (how to name it/make it
> nameable by the user, etc - maybe they'd end up putting a module name
> prefix on it or something that would be unhelpful?)
>
> I'd really like to stick to known/existing DWARF as much as possible &
> consider very carefully anywhere we diverge from that.
>
>
>>
>>
>>
>>> and a DW_AT_name+DW_AT_signature at the other end, we would have all the
>>> information we need without introducing any further LLVM-specific DWARF
>>> extensions. To look up an external type from the PCM, the consumer imports
>>> the DW_TAG_module and deserializes the type found by declcontext+name. To
>>> load the type from DWARF, the consumer grabs the signature from the forward
>>> declaration and magically (1) finds the PCM and looks up the type by
>>> signature (2).
>>>
>>> (1) My suggestion is to extend LLVM so it can put the DW_TAG_module with
>>> the forward declaration inside the skeleton compile unit (which has the
>>> path to the PCM and its DWOid).
>>> (2) On ELF with type units this works out of the box,
>>>
>>
>> Not necessarily - the use of DW_TAG_modules in the scope chain might
>> confuse/break things. It's pretty unprecedented/non-standard, I would think?
>>
>>
>> It’s use for C++ is unprecedented, but the tag itself very standard.
>>
>
> Sure - I get that the tag is standard, but what it means (& especially
> what it means to have types inside it) in the context of C++ debugging is
> something I'm fairly concerned about.
>
>> on MachO without type units we need some kind of index mapping signature
>>> -> DIE (bag of DWARF style?).
>>>
>>
>> I was rather hoping you guys would implement type units (since they'll be
>> a step towards Bag O DWARF anyway) on MachO... - at least for this case.
>> They wouldn't have to be COMDAt'd or each in their own section, they'd just
>> be in a debug_types section one after the other in the module .o file.
>>
>>
>> How does a consumer today find a type unit for a given signature? Does it
>> build its own index based on the signatures of the COMDAT sections? DWP
>> files define a .debug_tu_index accelerator section to this end, but how is
>> this normally handled?
>>
>
> .dwo files have no COMDAT sections - you just put all the type units in
> them directly.
>
> In any case, that's still going to be a perf hit, because you'll at least
> have to read some part of every .dwo file to see which types are in it.
>
> I would believe/assume/imagine this is a performance problem for Type
> Units + Fission as they are defined today (as I've alluded to above) & may
> be worth considering general DWARF solutions (talk with the committee sort
> of stuff) that may generalize to module debug info too. It's also possible
> I've missed something and there are existing solutions to the Type Units +
> Fission performance problem.
>
>
>>
>>
>> Assuming that many external types will share a similar DeclContext prefix
>>> I am not very worried by the space needed to store the long forward
>>> references. Compared to storing the mangled name for every type
>>>
>>
>> What I was picturing wasn't to storet the mangled name anywhere - but to
>> have, in the module object file somewhere a table from type hash to <useful
>> way of accessing a type in a module, which I hope is just a byte offset or
>> something similary cheap, small, and fast - not a table lookup, etc, but
>> whatever the table lookup has as its values already>.
>>
>>
>> With the decl context and the TAG_module on top a consumer knows which
>> module to import (fast) and can do the import by name (hash lookup). With
>> the table in the module object and no TAG_module in the decl context, it
>> has to scan all modules for the type signature (multiple hash lookups) and
>> can directly import the type (fast).
>>
>> -- adrian
>>
>>
>>
>>> it will often actually take up less space. Also, on Darwin at least,
>>> llvm-dsymutil can strip them out of the way after resolving the external
>>> type references.
>>>
>>> -- adrian
>>>
>>>
>>> - Dave
>>>
>>> On Wed, May 6, 2015 at 3:43 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>>
>>>>
>>>> On May 6, 2015, at 3:24 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, May 6, 2015 at 3:15 PM, Adrian Prantl <aprantl at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On May 6, 2015, at 2:52 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 6, 2015 at 2:45 PM, Adrian Prantl <aprantl at apple.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On May 6, 2015, at 2:35 PM, Eric Christopher <echristo at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> That said, add enough to the name for hashing purposes to make it
>>>>>> hash uniquely? Or you can go down the path of hashing the type similar to
>>>>>> the fission CU hashing (which is what type units were arguably designed to
>>>>>> do in the first place if you take a look at the standard, we just only use
>>>>>> them for ODR compliant languages etc right now).
>>>>>>
>>>>>>
>>>>>> I suppose one could hash the entire module configuration + the
>>>>>> mangled name and get something that is relatively stable.
>>>>>> For implementation reasons it would be terrible to do the full
>>>>>> fission hashing because that would mean that we would actually have to look
>>>>>> up (and deserialize the type) in order to get to its ID when emitting an
>>>>>> external type reference, which would void at least some of the performance
>>>>>> gains we want from module debugging.
>>>>>>
>>>>>
>>>>> I thought you were proposing using the mangled name of the type for
>>>>> the identifier anyway? Perhaps I misunderstood - what are you proposing to
>>>>> use? In any case, I'd prefer to see whatever it is hashed and used as the
>>>>> type unit signature for compatibility with DWARF5, rather than adding an
>>>>> extra/separate/new/non-standard way to do cross-unit/cross-fission type
>>>>> references.
>>>>>
>>>>>
>>>>> In the IR I’d /like/ to have a DIExternalTypeRef(DW_TAG_class_type,
>>>>> !”_ZTC6TypeName”, !1) with !1 being a reference to either the DIModule or
>>>>> the skeleton CU. Then the backend would emit the hash of the name if type
>>>>> units are enabled (C++/gdb) or the mangled name (+ the accelerator table
>>>>> entry) otherwise (ObjC and/or Darwin). If there is significant pushback to
>>>>> the latter, I’d be willing to have the backend emit a hash in both cases
>>>>> but we’d have to careful about what to exactly to hash for all the
>>>>> aforementioned reasons.
>>>>>
>>>>
>>>> I don't follow - if the mangled name is sufficient, then a hash of the
>>>> mangled name should be.... what am I missing?
>>>>
>>>> Nothing, these are two separate issues:
>>>>
>>>> If you don't have an ODR to rely on, then a mangled name seems
>>>> insufficient just as the hash would be.
>>>>
>>>>
>>>> The decision for mangled name vs hash is motivated by the mangled name
>>>> also doubling as a key to look up the type in the AST.
>>>> The other problem is (partially) solved by the accelerator table entry
>>>> that associates the mangled name with a module. I’m starting to think now
>>>> that it might be better to include a fission-style forward declaration +
>>>> decl context into the TAG_module instead. The DWARF-style decl context
>>>> could in theory be smaller than the mangled name because two types could
>>>> share common ancestors and then we could emit the same hash of the mangled
>>>> names as we do for type units. But let’s discuss this when it comes up
>>>> (together with the patch that makes use of DIExternalTypeRef).
>>>>
>>>> -- adrian
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150824/73e13186/attachment-0001.html>