Fwd: [PATCH] Have clang list the imported modules in the debug info

Wed Aug 19 13:23:23 PDT 2015

(add the right list)
---------- Forwarded message ----------
From: David Blaikie <dblaikie at gmail.com>
Date: Wed, Aug 19, 2015 at 1:20 PM
Subject: Re: [PATCH] Have clang list the imported modules in the debug info
To: Adrian Prantl <aprantl at apple.com>
Cc: Eric Christopher <echristo at gmail.com>, Zachary Turner <
zturner at google.com>, "Robinson, Paul" <Paul_Robinson at playstation.sony.com>,
Richard Smith <richard at metafoo.co.uk>, llvm cfe <cfe-commits at cs.uiuc.edu>,
Greg Clayton <gclayton at apple.com>, Sean Callanan <scallanan at apple.com>

On Mon, Aug 10, 2015 at 5:00 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Jul 24, 2015, at 12:33 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
> *reads back through the thread*
>
>
> appreciated, it’s long :-)
>
> So what I originally had in mind about a year ago when we discussed this,
> was that the module data could have an extra table from type hash to
> whatever useful internal representation to find the type in the PCM.
>
>
> It turned out that the most useful internal type representation to find a
> type in a PCM is the type’s DeclContext+Name; this is how (surprise!) clang
> looks up types in a PCM and the format is supposed to be fast for these
> kind of lookups.
>

Still, I would imagine there would be some kind of direct access (the
offset in the file, or somesuch) rather than actually having to go through
hashtables, etc. No? (how does one module refer to types in another module?
Really by name?)

>
> Everything else would just be DWARF with type units and fission (with the
> slight wrinkle of type units that aren't resolvable within a single object
> file - they could reference cross-object/dwo file) - emitting a fission CU
> for each referenced module.
>
> Needing modules to disambiguate/avoid collisions/support non-odr languages
> wasn't something I understood/had considered back then. That explains the
> need to add module references to the CU, so the debugger can know which
> modules to search for the types in (& doesn't just go loading all of them,
> etc).
>
> I would still picture this as "normal type units + a table in the module
> to resolve types", but if you guys particularly like using the mangled
> string name (or other identifier) in the DWARF that may avoid the need for
> an intermediate table (but it doesn't sound like you are avoiding an
> intermediate table - you said something about having an
> accelerator-table-like thing to aid in the DWARF->AST mapping? So could
> that be key'd of the type hash/signature we're using, thus keeping the
> DWARF more plain/vanilla DWARF5 (type units + fission)?)
>
>
> I originally disliked type signatures and favored using mangled names
> because the mangled names contained the DeclContext necessary to find types
> in the PCM. But if we can squeeze the DeclContext somewhere else, that’s
> fine.
>
> From the discussion we had on FlagExternalTypeRef I got the impression
> that long-form forward declarations are starting to look more attractive:
> If every external type reference is a reference to a forward declaration
> that has a complete decl context,
>

While that's conveniently what we output currently, I'm not sure it's a
great idea to rely on it. We might one day optimize type references (&
we'll certainly need to optimize non-type references like member functions,
etc - since emitting a stub for those would start to, more visibly, reduce
the benefit of doing this work in the first place, I would imagine) so that
when there's no contents (which will be more common once we can reference
members directly with Bag O DWARF) we just have a DW_AT_type encoded as a
DW_FORM_ref_sig8 directly.

with a DW_TAG_module at the root of the decl context chain,
>

This ^ is something I didn't have in mind and would complicate things
somewhat. I'd really like to keep things as close to the existing standard
type unit + split dwarf standards as possible except where necessary to do
otherwise.

> and a DW_AT_name+DW_AT_signature at the other end, we would have all the
> information we need without introducing any further LLVM-specific DWARF
> extensions. To look up an external type from the PCM, the consumer imports
> the DW_TAG_module and deserializes the type found by declcontext+name. To
> load the type from DWARF, the consumer grabs the signature from the forward
> declaration and magically (1) finds the PCM and looks up the type by
> signature (2).
>
> (1) My suggestion is to extend LLVM so it can put the DW_TAG_module with
> the forward declaration inside the skeleton compile unit (which has the
> path to the PCM and its DWOid).
> (2) On ELF with type units this works out of the box,
>

Not necessarily - the use of DW_TAG_modules in the scope chain might
confuse/break things. It's pretty unprecedented/non-standard, I would think?

> on MachO without type units we need some kind of index mapping signature
> -> DIE (bag of DWARF style?).
>

I was rather hoping you guys would implement type units (since they'll be a
step towards Bag O DWARF anyway) on MachO... - at least for this case. They
wouldn't have to be COMDAt'd or each in their own section, they'd just be
in a debug_types section one after the other in the module .o file.

>
> Assuming that many external types will share a similar DeclContext prefix
> I am not very worried by the space needed to store the long forward
> references. Compared to storing the mangled name for every type
>

What I was picturing wasn't to storet the mangled name anywhere - but to
have, in the module object file somewhere a table from type hash to <useful
way of accessing a type in a module, which I hope is just a byte offset or
something similary cheap, small, and fast - not a table lookup, etc, but
whatever the table lookup has as its values already>.

> it will often actually take up less space. Also, on Darwin at least,
> llvm-dsymutil can strip them out of the way after resolving the external
> type references.
>
> -- adrian
>
>
> - Dave
>
> On Wed, May 6, 2015 at 3:43 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On May 6, 2015, at 3:24 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Wed, May 6, 2015 at 3:15 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> On May 6, 2015, at 2:52 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, May 6, 2015 at 2:45 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>>
>>>>
>>>> On May 6, 2015, at 2:35 PM, Eric Christopher <echristo at gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> That said, add enough to the name for hashing purposes to make it hash
>>>> uniquely? Or you can go down the path of hashing the type similar to the
>>>> fission CU hashing (which is what type units were arguably designed to do
>>>> in the first place if you take a look at the standard, we just only use
>>>> them for ODR compliant languages etc right now).
>>>>
>>>>
>>>> I suppose one could hash the entire module configuration + the mangled
>>>> name and get something that is relatively stable.
>>>> For implementation reasons it would be terrible to do the full fission
>>>> hashing because that would mean that we would actually have to look up (and
>>>> deserialize the type) in order to get to its ID when emitting an external
>>>> type reference, which would void at least some of the performance gains we
>>>> want from module debugging.
>>>>
>>>
>>> I thought you were proposing using the mangled name of the type for the
>>> identifier anyway? Perhaps I misunderstood - what are you proposing to use?
>>> In any case, I'd prefer to see whatever it is hashed and used as the type
>>> unit signature for compatibility with DWARF5, rather than adding an
>>> extra/separate/new/non-standard way to do cross-unit/cross-fission type
>>> references.
>>>
>>>
>>> In the IR I’d /like/ to have a DIExternalTypeRef(DW_TAG_class_type,
>>> !”_ZTC6TypeName”, !1) with !1 being a reference to either the DIModule or
>>> the skeleton CU. Then the backend would emit the hash of the name if type
>>> units are enabled (C++/gdb) or the mangled name (+ the accelerator table
>>> entry) otherwise (ObjC and/or Darwin). If there is significant pushback to
>>> the latter, I’d be willing to have the backend emit a hash in both cases
>>> but we’d have to careful about what to exactly to hash for all the
>>> aforementioned reasons.
>>>
>>
>> I don't follow - if the mangled name is sufficient, then a hash of the
>> mangled name should be.... what am I missing?
>>
>> Nothing, these are two separate issues:
>>
>> If you don't have an ODR to rely on, then a mangled name seems
>> insufficient just as the hash would be.
>>
>>
>> The decision for mangled name vs hash is motivated by the mangled name
>> also doubling as a key to look up the type in the AST.
>> The other problem is (partially) solved by the accelerator table entry
>> that associates the mangled name with a module. I’m starting to think now
>> that it might be better to include a fission-style forward declaration +
>> decl context into the TAG_module instead. The DWARF-style decl context
>> could in theory be smaller than the mangled name because two types could
>> share common ancestors and then we could emit the same hash of the mangled
>> names as we do for type units. But let’s discuss this when it comes up
>> (together with the patch that makes use of DIExternalTypeRef).
>>
>> -- adrian
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150819/71c97665/attachment-0001.html>