[PATCH] Have clang list the imported modules in the debug info

Tue Feb 24 15:06:04 PST 2015

On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Feb 24, 2015, at 2:36 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Feb 23, 2015, at 3:37 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> On Feb 23, 2015, at 3:14 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>>
>>>>
>>>> On Feb 23, 2015, at 2:59 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <aprantl at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > My vague recollection from the previous design discussions was that
>>>>> these module references would be their own 'unit' COMDAT'd so that we don't
>>>>> end up with the duplication of every module reference in every unit linked
>>>>> together when linking debug info?
>>>>> >
>>>>> > I think in my brain I'd been picturing this module reference as
>>>>> being an extended fission reference (fission skeleton CU + extra fields for
>>>>> users who want to load the Clang AST module directly and skip the split CU).
>>>>>
>>>>> Apologies for letting this rest for so long.
>>>>>
>>>>> Your memory was of course correct and I didn’t follow up on this
>>>>> because I had convinced myself that the fission reference would be
>>>>> completely sufficient. Now that I’ve been thinking some more about it, I
>>>>> don’t think that it is sufficient in the LTO case.
>>>>>
>>>>> Here is the example from the
>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html:
>>>>>
>>>>> foo.o:
>>>>> .debug_info.dwo
>>>>>   DW_TAG_compile_unit
>>>>>      // For DWARF consumers
>>>>>      DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>>>>>      DW_AT_dwo_id   ([unique AST signature])
>>>>>
>>>>> .debug_info
>>>>>   DW_TAG_compile_unit
>>>>>     DW_TAG_variable
>>>>>       DW_AT_name "x"
>>>>>       DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
>>>>>
>>>>> In this example it is clear that foo.o imported MyModule because its
>>>>> DWO skeleton is there in the same object file. But if we deal with the
>>>>> result of an LTO compilation we will end up with many compile units in the
>>>>> same .debug_info section, plus a bunch of skeleton compile units for _all_
>>>>> imported modules in the entire project. We thus loose the ability to
>>>>> determine which of the compile units imported which module.
>>>>>
>>>>
>>>> Why would we need to know which CU imported which modules? (I can
>>>> imagine some possible reasons, but wondering what you have in mind)
>>>>
>>>>
>>>> When the debugger is stopped at a breakpoint and the user wants to
>>>> evaluate an expression, it should import the modules that are available at
>>>> this location, so the user can write the expression from within the context
>>>> of the breakpoint (e.g., without having to fully qualify each type, etc).
>>>>
>>>
>>> I'm not sure how much current debuggers actually worry about that - (&
>>> this may differ from lldb to gdb to other things, of course). I'm pretty
>>> sure at least for GDB, a context in one CU is as good as one in another (at
>>> least without split-dwarf, type units, etc - with those sometimes things
>>> end up overly restrictive as the debugger won't search everything properly).
>>>
>>> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and you
>>> run 'start' in gdb (which breaks at the beginning of main) you can still
>>> run 'p func()' to call the func, even though there's no declaration of it
>>> in a.cpp, etc.
>>>
>>>
>>> LLDB would definitely care (as it is using clang for the expression
>>> evaluation supporting these kinds of features is really straightforward
>>> there). By importing the modules (rather than searching through the DWARF),
>>> the expression evaluator gains access to additional declarations that are
>>> not there in the DWARF, such as templates. But since clang modules are not
>>> namespaces, we can’t generally "import the world” as a debugger would
>>> usually do.
>>>
>>
>> Sorry, not sure I understand this last sentence - could you explain
>> further?
>>
>> I imagine it would be rather limiting for the user if they could only use
>> expressions that are valid in this file from the file - it wouldn't be
>> uncommon to want to call a function from another module/file/etc to aid in
>> debugging.
>>
>>
>> Usually LLDB’s expression evaluator works by creating a clang AST type
>> out of a DWARF type and inserting it into its AST context. We could
>> pre-polulate it with the definitions from the imported modules (with all
>> sorts of benefits as described above), but that only works if no two
>> modules conflict. If the declaration can’t be found in any imported module,
>> LLDB would still import it from DWARF in the “traditional” fashion.
>>
>
> But it would import it from DWARF in other TUs rather than use the module
> info just because the module wasn't directly referenced from this TU? That
> would seem strange to me. (you would lose debug info fidelity (by falling
> back to DWARF even though there are modules with the full fidelity info)
> unnecessarily, it sounds like)
>
>
> I think it’s reasonable to expect full fidelity for everything that is
> available in the current TU, and having the normal DWARF-based debugging
> capabilities for everything beyond that. But we can only ever provide full
> fidelity if we have the list of imports for the current TU.
>
>
> Would it be reasonable to use the accelerator table/index to lookup the
> types, then if the type is in the module you could use the module rather
> than the DWARF stashed alongside it? (so the comdat'd split-dwarf skeleton
> CU for the module would have an index to tell you what names are inside it,
> but if you got an index hit you'd just look at the module instead of
> loading the split-dwarf debug info in the referenced file)
>
>
> I don’t think this approach would work for templates and enumerator values;
>

Not sure why enumerator values are an issue - but templates (& all manner
of other things that don't make it into the index, unfortunately), sure.

> they aren’t in the accelerator tables to begin with. It would also be
> slower if the declaration is available in a module.
>

Though you're rapidly going to end up loading a lot of modules in (as you
go up & down a stack printing various things you'll cross into other TUs &
load more modules).

For a standard DWARF consumer, it seems fine to just have a comdat'd
skeleton CU for a module without the need for other CUs to mention which
module CUs they reference (but I could be wrong here) & that's the design
we originally discussed.

It would seem unfortunate to bloat every CU with a non-deduplicable list of
every module it references, but if that's necessary for a serialized AST
aware debugger, it might be fine to have it as an option (so long as it can
be turned off) & may still benefit from that list not being the
authoritative module reference, but a /very/ terse reference to it so all
the extra flags & stuff can be in the deduplicable comdat (& to keep it as
consistent as possible between the flag (on/off) codepaths for this extra
data). Maybe a FORM_block (?) of fixed-size hashes of all the modules
back-to-back, so it's as small as possible?

But I wouldn't mind spending some more time discussing whether there's a
better way to keep these things streamlined/symmetric/the same between
modular and non-modular debug info.

- David

>
> -- adrian
>
>
> - David
>
>
>
>
>>
>> -- adrian
>>
>>
>>
>>>
>>> -- adrian
>>>
>>>
>>>>
>>>>> I think it really is necessary to put the info about the module
>>>>> imported into the compile unit that imported it. Or is there a way to do
>>>>> this using the fission capabilities that I’m not aware of?
>>>>>
>>>>> -- adrian
>>>>>
>>>>> >
>>>>> > [rambling a bit more along those lines:
>>>>> > This would work fine in the case of the module (now an object file)
>>>>> containing all the static debug info
>>>>> > The future step, when we put IR/object code in a module to be linked
>>>>> into the final binary, we could put the skeleton CU in that object file
>>>>> that's being linked in (then we wouldn't need to COMDAT it) or, optionally,
>>>>> link in the debug info itself (skipping the indirection through the
>>>>> external file) if a standalone debug info executable was desired]
>>>>>
>>>>>
>>>>>
>>>>> >
>>>>> > On Tue, Jan 20, 2015 at 9:39 AM, Adrian Prantl <aprantl at apple.com>
>>>>> wrote:
>>>>> > As a complementary part of the module debugging story, here is a
>>>>> proposal to list the imported modules in the debug info. This patch is not
>>>>> about efficiency, but rather enables a cool debugging feature:
>>>>> >
>>>>> > Record the clang modules imported by the current compile unit in the
>>>>> debug info. This allows a module-aware debugger (such as LLDB) to @import
>>>>> all modules visible in the current context before evaluating an expression,
>>>>> thus making available all declarations in the current context (that
>>>>> originate from a module) and not just the ones that were actually used by
>>>>> the program.
>>>>> >
>>>>> > This implementation uses existing DWARF mechanisms as much as
>>>>> possible by emitting a DW_TAG_imported_module that references a
>>>>> DW_TAG_module, which contains the information necessary for the debugger to
>>>>> rebuild the module. This is similar to how C++ using declarations are
>>>>> encoded in DWARF, with the difference that we're importing a module instead
>>>>> of a namespace.
>>>>> > The information stored for a module includes the umbrella directory,
>>>>> any config macros passed in via the command line that affect the module,
>>>>> and the filename of the raw .pcm file. Why include all these parameters
>>>>> when we have the .pcm file? Apart from module chache volatility, there is
>>>>> no guarantee that the debugger was linked against the same version of clang
>>>>> that generated the .pcm, so it may need to regenerate the module while
>>>>> importing it.
>>>>> >
>>>>> > Let me know what you think!
>>>>> > -- adrian
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150224/1ca3274c/attachment.html>