[PATCH] Have clang list the imported modules in the debug info

Mon Feb 23 15:37:05 PST 2015

On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Feb 23, 2015, at 3:14 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Feb 23, 2015, at 2:59 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >
>>> > My vague recollection from the previous design discussions was that
>>> these module references would be their own 'unit' COMDAT'd so that we don't
>>> end up with the duplication of every module reference in every unit linked
>>> together when linking debug info?
>>> >
>>> > I think in my brain I'd been picturing this module reference as being
>>> an extended fission reference (fission skeleton CU + extra fields for users
>>> who want to load the Clang AST module directly and skip the split CU).
>>>
>>> Apologies for letting this rest for so long.
>>>
>>> Your memory was of course correct and I didn’t follow up on this because
>>> I had convinced myself that the fission reference would be completely
>>> sufficient. Now that I’ve been thinking some more about it, I don’t think
>>> that it is sufficient in the LTO case.
>>>
>>> Here is the example from the
>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html:
>>>
>>> foo.o:
>>> .debug_info.dwo
>>>   DW_TAG_compile_unit
>>>      // For DWARF consumers
>>>      DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>>>      DW_AT_dwo_id   ([unique AST signature])
>>>
>>> .debug_info
>>>   DW_TAG_compile_unit
>>>     DW_TAG_variable
>>>       DW_AT_name "x"
>>>       DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
>>>
>>> In this example it is clear that foo.o imported MyModule because its DWO
>>> skeleton is there in the same object file. But if we deal with the result
>>> of an LTO compilation we will end up with many compile units in the same
>>> .debug_info section, plus a bunch of skeleton compile units for _all_
>>> imported modules in the entire project. We thus loose the ability to
>>> determine which of the compile units imported which module.
>>>
>>
>> Why would we need to know which CU imported which modules? (I can imagine
>> some possible reasons, but wondering what you have in mind)
>>
>>
>> When the debugger is stopped at a breakpoint and the user wants to
>> evaluate an expression, it should import the modules that are available at
>> this location, so the user can write the expression from within the context
>> of the breakpoint (e.g., without having to fully qualify each type, etc).
>>
>
> I'm not sure how much current debuggers actually worry about that - (&
> this may differ from lldb to gdb to other things, of course). I'm pretty
> sure at least for GDB, a context in one CU is as good as one in another (at
> least without split-dwarf, type units, etc - with those sometimes things
> end up overly restrictive as the debugger won't search everything properly).
>
> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } and you run
> 'start' in gdb (which breaks at the beginning of main) you can still run 'p
> func()' to call the func, even though there's no declaration of it in
> a.cpp, etc.
>
>
> LLDB would definitely care (as it is using clang for the expression
> evaluation supporting these kinds of features is really straightforward
> there). By importing the modules (rather than searching through the DWARF),
> the expression evaluator gains access to additional declarations that are
> not there in the DWARF, such as templates. But since clang modules are not
> namespaces, we can’t generally "import the world” as a debugger would
> usually do.
>

Sorry, not sure I understand this last sentence - could you explain further?

I imagine it would be rather limiting for the user if they could only use
expressions that are valid in this file from the file - it wouldn't be
uncommon to want to call a function from another module/file/etc to aid in
debugging.

>
> -- adrian
>
>
>>
>>> I think it really is necessary to put the info about the module imported
>>> into the compile unit that imported it. Or is there a way to do this using
>>> the fission capabilities that I’m not aware of?
>>>
>>> -- adrian
>>>
>>> >
>>> > [rambling a bit more along those lines:
>>> > This would work fine in the case of the module (now an object file)
>>> containing all the static debug info
>>> > The future step, when we put IR/object code in a module to be linked
>>> into the final binary, we could put the skeleton CU in that object file
>>> that's being linked in (then we wouldn't need to COMDAT it) or, optionally,
>>> link in the debug info itself (skipping the indirection through the
>>> external file) if a standalone debug info executable was desired]
>>>
>>>
>>>
>>> >
>>> > On Tue, Jan 20, 2015 at 9:39 AM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>> > As a complementary part of the module debugging story, here is a
>>> proposal to list the imported modules in the debug info. This patch is not
>>> about efficiency, but rather enables a cool debugging feature:
>>> >
>>> > Record the clang modules imported by the current compile unit in the
>>> debug info. This allows a module-aware debugger (such as LLDB) to @import
>>> all modules visible in the current context before evaluating an expression,
>>> thus making available all declarations in the current context (that
>>> originate from a module) and not just the ones that were actually used by
>>> the program.
>>> >
>>> > This implementation uses existing DWARF mechanisms as much as possible
>>> by emitting a DW_TAG_imported_module that references a DW_TAG_module, which
>>> contains the information necessary for the debugger to rebuild the module.
>>> This is similar to how C++ using declarations are encoded in DWARF, with
>>> the difference that we're importing a module instead of a namespace.
>>> > The information stored for a module includes the umbrella directory,
>>> any config macros passed in via the command line that affect the module,
>>> and the filename of the raw .pcm file. Why include all these parameters
>>> when we have the .pcm file? Apart from module chache volatility, there is
>>> no guarantee that the debugger was linked against the same version of clang
>>> that generated the .pcm, so it may need to regenerate the module while
>>> importing it.
>>> >
>>> > Let me know what you think!
>>> > -- adrian
>>> >
>>> >
>>> >
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150223/0d082ba3/attachment.html>