[PATCH] Have clang list the imported modules in the debug info

Mon Mar 9 14:14:16 PDT 2015

On Mon, Mar 9, 2015 at 1:52 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Feb 24, 2015, at 3:06 PM, David Blaikie <dblaikie at gmail.com> wrote:
> On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>> On Feb 24, 2015, at 2:36 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>> On Feb 23, 2015, at 3:37 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>>
>>>> On Feb 23, 2015, at 3:14 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <aprantl at apple.com>
>>>> wrote:
>>>>
>>>>> On Feb 23, 2015, at 2:59 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>
>>>>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <aprantl at apple.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > My vague recollection from the previous design discussions was that
>>>>>> these module references would be their own 'unit' COMDAT'd so that we don't
>>>>>> end up with the duplication of every module reference in every unit linked
>>>>>> together when linking debug info?
>>>>>> >
>>>>>> > I think in my brain I'd been picturing this module reference as
>>>>>> being an extended fission reference (fission skeleton CU + extra fields for
>>>>>> users who want to load the Clang AST module directly and skip the split CU).
>>>>>>
>>>>>> Apologies for letting this rest for so long.
>>>>>>
>>>>>> Your memory was of course correct and I didn’t follow up on this
>>>>>> because I had convinced myself that the fission reference would be
>>>>>> completely sufficient. Now that I’ve been thinking some more about it, I
>>>>>> don’t think that it is sufficient in the LTO case.
>>>>>>
>>>>>> Here is the example from the
>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html:
>>>>>>
>>>>>> foo.o:
>>>>>> .debug_info.dwo
>>>>>>   DW_TAG_compile_unit
>>>>>>      // For DWARF consumers
>>>>>>      DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>>>>>>      DW_AT_dwo_id   ([unique AST signature])
>>>>>>
>>>>>> .debug_info
>>>>>>   DW_TAG_compile_unit
>>>>>>     DW_TAG_variable
>>>>>>       DW_AT_name "x"
>>>>>>       DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
>>>>>>
>>>>>> In this example it is clear that foo.o imported MyModule because its
>>>>>> DWO skeleton is there in the same object file. But if we deal with the
>>>>>> result of an LTO compilation we will end up with many compile units in the
>>>>>> same .debug_info section, plus a bunch of skeleton compile units for _all_
>>>>>> imported modules in the entire project. We thus loose the ability to
>>>>>> determine which of the compile units imported which module.
>>>>>>
>>>>>
>>>>> Why would we need to know which CU imported which modules? (I can
>>>>> imagine some possible reasons, but wondering what you have in mind)
>>>>>
>>>>>
>>>>> When the debugger is stopped at a breakpoint and the user wants to
>>>>> evaluate an expression, it should import the modules that are available at
>>>>> this location, so the user can write the expression from within the context
>>>>> of the breakpoint (e.g., without having to fully qualify each type, etc).
>>>>>
>>>>
>>>> I'm not sure how much current debuggers actually worry about that - (&
>>>> this may differ from lldb to gdb to other things, of course). I'm pretty
>>>> sure at least for GDB, a context in one CU is as good as one in another (at
>>>> least without split-dwarf, type units, etc - with those sometimes things
>>>> end up overly restrictive as the debugger won't search everything properly).
>>>>
>>>> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and you
>>>> run 'start' in gdb (which breaks at the beginning of main) you can still
>>>> run 'p func()' to call the func, even though there's no declaration of it
>>>> in a.cpp, etc.
>>>>
>>>>
>>>> LLDB would definitely care (as it is using clang for the expression
>>>> evaluation supporting these kinds of features is really straightforward
>>>> there). By importing the modules (rather than searching through the DWARF),
>>>> the expression evaluator gains access to additional declarations that are
>>>> not there in the DWARF, such as templates. But since clang modules are not
>>>> namespaces, we can’t generally "import the world” as a debugger would
>>>> usually do.
>>>>
>>>
>>> Sorry, not sure I understand this last sentence - could you explain
>>> further?
>>>
>>> I imagine it would be rather limiting for the user if they could only
>>> use expressions that are valid in this file from the file - it wouldn't be
>>> uncommon to want to call a function from another module/file/etc to aid in
>>> debugging.
>>>
>>>
>>> Usually LLDB’s expression evaluator works by creating a clang AST type
>>> out of a DWARF type and inserting it into its AST context. We could
>>> pre-polulate it with the definitions from the imported modules (with all
>>> sorts of benefits as described above), but that only works if no two
>>> modules conflict. If the declaration can’t be found in any imported module,
>>> LLDB would still import it from DWARF in the “traditional” fashion.
>>>
>>
>> But it would import it from DWARF in other TUs rather than use the module
>> info just because the module wasn't directly referenced from this TU? That
>> would seem strange to me. (you would lose debug info fidelity (by falling
>> back to DWARF even though there are modules with the full fidelity info)
>> unnecessarily, it sounds like)
>>
>>
>> I think it’s reasonable to expect full fidelity for everything that is
>> available in the current TU, and having the normal DWARF-based debugging
>> capabilities for everything beyond that. But we can only ever provide full
>> fidelity if we have the list of imports for the current TU.
>>
>>
>> Would it be reasonable to use the accelerator table/index to lookup the
>> types, then if the type is in the module you could use the module rather
>> than the DWARF stashed alongside it? (so the comdat'd split-dwarf skeleton
>> CU for the module would have an index to tell you what names are inside it,
>> but if you got an index hit you'd just look at the module instead of
>> loading the split-dwarf debug info in the referenced file)
>>
>>
>> I don’t think this approach would work for templates and enumerator
>> values;
>>
>
> Not sure why enumerator values are an issue - but templates (& all manner
> of other things that don't make it into the index, unfortunately), sure.
>
>
>> they aren’t in the accelerator tables to begin with. It would also be
>> slower if the declaration is available in a module.
>>
>
> Though you're rapidly going to end up loading a lot of modules in (as you
> go up & down a stack printing various things you'll cross into other TUs &
> load more modules).
>
> For a standard DWARF consumer, it seems fine to just have a comdat'd
> skeleton CU for a module without the need for other CUs to mention which
> module CUs they reference (but I could be wrong here) & that's the design
> we originally discussed.
>
> It would seem unfortunate to bloat every CU with a non-deduplicable list
> of every module it references, but if that's necessary for a serialized AST
> aware debugger, it might be fine to have it as an option (so long as it can
> be turned off) & may still benefit from that list not being the
> authoritative module reference, but a /very/ terse reference to it so all
> the extra flags & stuff can be in the deduplicable comdat (& to keep it as
> consistent as possible between the flag (on/off) codepaths for this extra
> data). Maybe a FORM_block (?) of fixed-size hashes of all the modules
> back-to-back, so it's as small as possible?
>
> But I wouldn't mind spending some more time discussing whether there's a
> better way to keep these things streamlined/symmetric/the same between
> modular and non-modular debug info.
>
> Sure!
> Now that we established that recording the list of imported modules for
> every CU is useful for an AST-based debugger,
>

+Richard, just to see if he's got some ideas about how a debugger might
efficiently use modules to support debugger scenarios and whether or not
having a list of which modules are referenced from which contexts is
valuable in that.

It still concerns me that this would create something of a
regression/oddity/difference between AST-based debug info (you wouldn't be
able to handle expressions referencing things in other TUs) and non-AST
based debug info (where I think the average user is used to not worrying
about what headers are included in the current file they're debugging when
they try to use a type or other identifier)

> let’s talk about how to most efficiently represent this information.
>
> In the CU, using DW_TAG_imported_module appears to be the most appropriate
> choice, even though there is some room for confusion since C++ using
> declarations are also represented this way. Inside the
> DW_TAG_imported_module, we could use
> (1) a DW_AT_import that references the skeleton (I hope that is the right
> terminology) CU for the module, the idea being that the skeleton CU would
> contain all the details (flags, name, include dirs, hash, ...) and be in a
> comdat'ed section.
>

I'd be concerned about overloading the terminology & confusing other
debuggers - they might try to follow the DW_AT_import and be surprised that
it doesn't refer to a DW_TAG_namespace tag.

> (2) David’s suggestion of using a custom form that records the module hash
> directly is quite space-efficient, but it has the drawback of not being
> resilient against small changes to the imported module
>

That's going to be true of the normal fission info here (the skeleton CU
and the full CU in the .dwo file (or module) are associated by hash) -
granted, in the "loading an AST" mode, you can ignore those hashes and rely
on your custom attributes instead.

> , since clang’s module hash changes each time the module is being rebuilt.
>

Clang's module hash only changes if the DWARF contents change - it doesn't
use a timestamp or anything. It seems like actually you're going to want to
fail to load even more aggressively - there are ways the AST might've
changed that the debug info doesn't reflect but are still important (a type
unreferenced in this module, but built into some other code that is not
built with debug info changes - no hash changes because the debug info for
that type is unreferenced here, but if you try to use it you could have an
incompatible layout, etc).

> This is less of an issue if the hash is referring to a skeleton CU in the
> same file, which contains all the detailed information.
>
> Personally I’d prefer option 1 because mostly uses the existing mechanisms
> from DWARF. Here’s a visual guide to the options on the table:
>
> (1)
> foo.o (compiled with, let’s call it .. "-gmodule-imports”)
> -----
> .debug_info:
>   DW_TAG_compile_unit
>     DW_AT_name(“foo.c”)
>     DW_TAG_imported_module
>       DW_AT_import(DW_FORM_ref_addr 0x123)  // Could be a FORM_ref_sig8
> 0x1234ABCDE as well.
>     DW_TAG_imported_module
>       DW_AT_import(...)
>
> .debug_info.dwo:
> // Skeleton CUs for modules imported by foo.o.
> 0x123:
>   DW_TAG_compile_unit
>     // Used by split-dwarf debuggers to find external type definitions.
>     DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/
> Foundation.pcm”)
>     DW_AT_dwo_id(“0x1234ABCDE”)
>
>     // Used by AST-based debuggers to import the module.
>     DW_AT_name(“Foundation”)
>

(side notes: the mixed indentation here makes it a bit hard to read this
example, and I'd make sure /all/ the extended attributes (including the
name here) use custom attribute names, not standard ones)

>     DW_AT_LLVM_sysroot(“/“)
>     DW_AT_LLVM_include_dir(“”)
>     DW_AT_LLVM_macros(“-DNDEBUG”)
>
> (2)
> .debug_info.dwo:
> (As above.)
>
> .debug_info:
>   DW_TAG_compile_unit
>     DW_AT_name(“foo.c”)
>     DW_AT_LLVM_imported_modules(DW_FORM_block 0x1234ABCDE 0xDEADBEEF
> 0x....)
>
> Now I’m curious what option (3) will look like; the one that we’ll
> actually implement!
>

;)

> -- adrian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150309/55702c01/attachment.html>