[PATCH] Add a DWOId field to DICompileUnit (so DWARF skeleton CUs can be expression in IR).

Mon May 18 16:38:01 PDT 2015

> On May 8, 2015, at 4:08 PM, Adrian Prantl <aprantl at apple.com> wrote:
> 
> 
>> On May 7, 2015, at 11:25 AM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>> 
>> 
>>> On May 6, 2015, at 3:27 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> On Wed, May 6, 2015 at 3:20 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>> 
>>>> On May 6, 2015, at 3:05 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Wed, May 6, 2015 at 3:02 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>> 
>>>>> On May 6, 2015, at 2:55 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, May 6, 2015 at 2:50 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>>> 
>>>>>> On May 6, 2015, at 2:44 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, May 6, 2015 at 2:41 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>>>> 
>>>>>>> On May 6, 2015, at 2:38 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, May 6, 2015 at 2:25 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>>>>> 
>>>>>>>> On May 5, 2015, at 1:23 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, May 5, 2015 at 11:52 AM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>>>>>> 
>>>>>>>>> On May 5, 2015, at 9:42 AM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, May 5, 2015 at 8:35 AM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>>>>>>> David, I’m not entirely sure if you are ok with the naming of the field as is.. is this good to commit from your side?
>>>>>>>>> 
>>>>>>>>> It's a bit generic/misleading given that for the common case of plain Fission the DWO ID is generated based on the DWARF in the backend, so it doesn't make sense to carry it as part of the metadata because it cannot be known then.
>>>>>>>>> 
>>>>>>>>> What's the particular part of the module debug info this will be used for? Your comment says it's specifically for the skeleton side (so we'll take the module ID in the frontend and emit a skeleton CU into the metadata)? What about the concrete side for the module's debug info? (it'll need the same dwo ID)
>>>>>>>> 
>>>>>>>> I casually mentioned that in one of the emails in the other thread; the idea is that when emitting a clang module the backend computes the regular DWARF DWOId for it. When emitting an external type reference, the frontend is using libDebugInfo to extract the DWOId from the module and emits a skeleton CU with pcm filename and DWOId.
>>>>>>>> 
>>>>>>>> That seems a bit heavy... I thought we discussed with Richard Smith that the module has an actual identifier we could use here?
>>>>>>> 
>>>>>>> I realize now that I haven't communicated this problem back to the list after discovering it, and I apologize for that: It is true that each type has a unique TypeId in the module,
>>>>>>> 
>>>>>>> I don't think I was suggesting using a module based type id - I figured we'd use the type IDs that type units already use (a hash of the fully qualified name - same as the type uniquing in debug info metadata that was implemented years ago (well the type uniquing uses the mangled name itself - I just hashed that to create the type identifier for the comdat section and type signature for type units))
>>>>>>> 
>>>>>>> In any case, I wasn't asking about the type identifier, I was asking about the module identifier (the DWO ID).
>>>>>> 
>>>>>> Oh. The module does have a module id, but that value is the result of running the random number generator in a loop until we hit a non-zero value, so it’s not exactly resilient against rebuilding the module.
>>>>>> 
>>>>>> OK, that sounds like the module ID that Chandler came across - apparently it was added to work around instability in the module output. Chandler's since fixed the module instability and removed this under a flag (I guess the flag is still on by default for implicit modules builds).
>>>>>> 
>>>>>> Though I'm confused - wouldn't rebuilding the module /want/ to change the identifier so these don't match anymore? The module might have different things compared to what the original code was built against... 
>>>>> 
>>>>> Remember that on Darwin the module cache is this (sort of) global shared space that in theory could have been cleared by the time llvm-dsymutil wants to link the debug info.
>>>>> In that case I want to be able to rebuild the module and have the exact same DWO ID or error.
>>>>> 
>>>>> Using the DWO ID seems insufficient if you want the ASTs to all match up - DWARF doesn't have all the information in the AST, it's lossy. Which means you could have a change to the ASTs that makes the modules incompatible that wouldn't be reflected in a hash of the DWARF.
>>>> 
>>>> That’s ok as far as dsymutil is concerned. Dsymutil itself only cares about DWARF and is happy as long as it can resolve all external type references and replace them with DW_FORM_ref_addrs.
>>>> [The dSYM bundle created by dsymutil would still contain the TAG_module, so if a debugger later decides to import the module it could still run into problems, but as far as dsymutil is concerned the DWARF is all that matters.]
>>>> 
>>>> OK - but for LLDB you're going to want to use some kind of identifier that's resilient to benign rebuilds with AST accuracy, perhaps? So we could use that number for the DWO ids?
>>>> 
>>>> I'd like to seriously consider options that would avoid parsing the DWARF in a module during compilation to retrieve the DWO ID to reference from the .o debug info... that just seems a bit heavy weight, as I said.
>>> 
>>> That’s a valid concern. It would be an easier sell if the DWO ID were part of a header. Keep in mind that we will have to open the module with libObject anyway to get to the AST section
>>> 
>>> I assume we're just going to merge the module object file and the debug info object file? Or is there something else in mind? (I'd like to avoid having to read from/write to either after they've been written out - (& if we could avoid writing them to separate files then merging, that'd be a bonus, certainly))
>> 
>> Can you clarify what you mean with module object file vs debug info object file, to ensure we’re talking about the same thing?
>> What I meant was that in the .pcm file there is a .clang_ast section that contains the serialized AST and also (among others) a .debug_info section that contains the debug info for the module. The LLVMModuleProvider (there’s a patch waiting on Richard’s approval floating around in cfe-commits) that is responsible for loading a clang module from disk opens the .pcm file hands a MemoryBuffer with the .clang_ast section to clang’s module deserializer while also extracting the DWOid from the .debug_info section. There is no other parsing of debug info besides getting to the DWOId.
> 
> We could eliminate the need for linking against DebugInfo by emitting a .debug_cu_index section into the .pcm with only a single entry and have the frontend parse that first hash table entry to
> extract the dwo id.

I’ve been thinking some more about this and I’m no longer opposed to just straight using the clang module id as the dwo_id. It has the big advantage of being very cheap to extract from the module, plus we don’t need to compute DWARF hash then. Realizing that it is quite improbable that the clang module cache is purged while building a project, we might as well go ahead and use the existing random module hash and then fix clang’s module hash to become deterministic if that should turn out to be problem in practice.

-- adrian

>> 
>> -- adrian
>> 
>>>  
>>> and the DWARF parsing would be restricted to the first abbreviation entry and the first compile unit DIE and no relocations necessary.
>>> 
>>> -- adrian
>>> 
>>>> 
>>>> - David
>>>>  
>>>> -- adrian
>>>> 
>>>>>  
>>>>> In other situations the requirements are less strict. If I’m just debugging without a dSYM I could imagine being happy with a best effort after rebuilding a module that is “close enough”.
>>>>> 
>>>>> -- adrian
>>>>> 
>>>>>>  
>>>>>> 
>>>>>> -- adrian
>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> but this is just a integer counter that is not necessarily stable across rebuilds. With modules becoming more deterministic, an id should survive a rebuild of the module, but if the header file the module is built with is altered in any way, the ids could be shuffled. It is more resilient (but still fast enough) to look up a type by decl context + name.
>>>>>>> 
>>>>>>> -- adrian
>>>>>>> 
>>>>>>>>> While I'm all for incremental development, a little context of the path you've got in mind may be helpful.
>>>>>>>> 
>>>>>>>> Right, the examples in cfe-dev were always DWARF and never showed the IR:
>>>>>>>> A skeleton CU emitted by the frontend would be represented as a
>>>>>>>>   !0 = DICompileUnit(file: MDFile(“/path/to”, “module.pcm”), dwoId: ABCD1234)
>>>>>>>> and is emitted as a compile_unit with an AT_dwo_name and an AT_dwo_id by the backend.
>>>>>>>> 
>>>>>>>> OK.
>>>>>>>>  
>>>>>>>> 
>>>>>>>> -- adrian
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - David
>>>>>>>>>  
>>>>>>>>> 
>>>>>>>>> -- adrian
>>>>>>>>> 
>>>>>>>>> > On May 4, 2015, at 9:04 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com <mailto:dexonsmith at apple.com>> wrote:
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >> On 2015 May 4, at 20:17, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Now with autoupgrade testcase.
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> http://reviews.llvm.org/D9488 <http://reviews.llvm.org/D9488>
>>>>>>>>> >>
>>>>>>>>> >> Files:
>>>>>>>>> >> include/llvm/IR/DIBuilder.h
>>>>>>>>> >> include/llvm/IR/DebugInfoMetadata.h
>>>>>>>>> >> lib/AsmParser/LLParser.cpp
>>>>>>>>> >> lib/Bitcode/Reader/BitcodeReader.cpp
>>>>>>>>> >> lib/Bitcode/Writer/BitcodeWriter.cpp
>>>>>>>>> >> lib/IR/AsmWriter.cpp
>>>>>>>>> >> lib/IR/DIBuilder.cpp
>>>>>>>>> >> lib/IR/DebugInfoMetadata.cpp
>>>>>>>>> >> lib/IR/LLVMContextImpl.h
>>>>>>>>> >> test/Assembler/mdcompileunit.ll
>>>>>>>>> >> test/Bitcode/DICompileUnit-upgrade.test
>>>>>>>>> >> test/Bitcode/Inputs/DICompileUnit-no-DWOId.bc
>>>>>>>>> >> unittests/IR/MetadataTest.cpp
>>>>>>>>> >>
>>>>>>>>> >> EMAIL PREFERENCES
>>>>>>>>> >> http://reviews.llvm.org/settings/panel/emailpreferences/ <http://reviews.llvm.org/settings/panel/emailpreferences/>
>>>>>>>>> >> <D9488.24927.patch>
>>>>>>>>> >
>>>>>>>>> > LGTM with some changes to the autoupgrade test.
>>>>>>>>> >
>>>>>>>>> >> Index: test/Bitcode/DICompileUnit-upgrade.test
>>>>>>>>> >> ===================================================================
>>>>>>>>> >> --- /dev/null
>>>>>>>>> >> +++ test/Bitcode/DICompileUnit-upgrade.test
>>>>>>>>> >
>>>>>>>>> > Can you be more descriptive?  Perhaps dicompileunit-no-dwoid.ll.
>>>>>>>>> > (Feel free to use the camel-case filename; I personally avoid
>>>>>>>>> > capitals but clearly there's no harm.)
>>>>>>>>> >
>>>>>>>>> >> @@ -0,0 +1,6 @@
>>>>>>>>> >> +RUN: llvm-dis %p/Inputs/DICompileUnit-no-DWOId.bc -o - | FileCheck %s
>>>>>>>>> >
>>>>>>>>> > %p was undocumented last I checked.  I recommend the documented %S
>>>>>>>>> > when this sort of functionality is necessary.
>>>>>>>>> >
>>>>>>>>> > But I wouldn't even use the `Inputs` folder here.  The usual bitcode
>>>>>>>>> > upgrade tests say:
>>>>>>>>> >
>>>>>>>>> >    ; RUN: llvm-dis < %s.bc | FileCheck %s
>>>>>>>>> >    ; RUN: verify-uselistorder < %s.bc
>>>>>>>>> >
>>>>>>>>> > and just drop a .ll.bc file next to a .ll test in `test/Bitcode`.
>>>>>>>>> >
>>>>>>>>> > Moreover, the test file should be the `.ll` file that was used to
>>>>>>>>> > generate the bitcode.  Please add a comment that says what revision of
>>>>>>>>> > LLVM was used to generate the bitcode, something like:
>>>>>>>>> >
>>>>>>>>> >    ; Bitcode generated from llvm-as @ r123456.
>>>>>>>>> >
>>>>>>>>> >> +The input uses the older form without a dwoId field.
>>>>>>>>> >> +This should default to 0,
>>>>>>>>> >> +which is not displayed at all in the textual representation.
>>>>>>>>> >> +CHECK: !DICompileUnit
>>>>>>>>> >> +CHECK-NOT: dwoId:
>>>>>>>>> >
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150518/bdf9f2f9/attachment.html>