[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

Eric Christopher echristo at gmail.com
Tue Oct 15 13:31:17 PDT 2013


On Tue, Oct 15, 2013 at 1:29 PM, Manman Ren <manman.ren at gmail.com> wrote:
>
>
>
> On Tue, Oct 15, 2013 at 11:34 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 10:51 AM, Manman Ren <manman.ren at gmail.com> wrote:
>>>
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 10:10 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 10:05 AM, Manman Ren <manman.ren at gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 9, 2013 at 5:22 PM, Manman Ren <manman.ren at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 9, 2013 at 1:32 PM, Manman Ren <manman.ren at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> David,
>>>>>>>
>>>>>>> Thanks for reviewing!
>>>>>>>
>>>>>>> On Wed, Oct 9, 2013 at 11:36 AM, David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Might be easier if these were on Phabricator, but here are some
>>>>>>>> thoughts:
>>>>>>>>
>>>>>>>> 0001:
>>>>>>>>   This patch generally, while separated for legibility, is untested
>>>>>>>> & difficult to discuss in isolation.
>>>>>>>
>>>>>>> I agree, this patch adds the functionality but does not use it, the
>>>>>>> 2nd patch uses ref_addr.
>>>>>>> If you think I should merge the two and commit as a single patch, let
>>>>>>> me know.
>>>>>>>
>>>>>>>>
>>>>>>>> I may need to refer to the second patch in reviewing this first one.
>>>>>>>>   DwarfDebug.cpp:
>>>>>>>>     computeSizeAndOffsets:
>>>>>>>>       I believe this produces the wrong offset for the 3rd CU and
>>>>>>>> onwards. computeSizeAndOffset returns the EndOffset which is absolute, not
>>>>>>>> relative to the Offset passed in, so it should be assigned to SecOffset, not
>>>>>>>> added to it. (eg: if you have CUs at 0, 42, and 57 - the first pass through
>>>>>>>> SecOffset will be zero, then it'll be 0 + 42, then on the 3rd it'll be (0 +
>>>>>>>> 42) + 57 which isn't right, it should just be 57). Please add more test
>>>>>>>> coverage while fixing this issue.
>>>>>>>
>>>>>>>
>>>>>>> computeSizeAndOffset takes an offset that is relative to the start of
>>>>>>> the CU and returns the offset relative to the CU after laying out the DIE.
>>>>>>> The initial offset before laying out the CU DIE is the header size,
>>>>>>> EndOffset will be the offset relative to the CU after laying out the whole
>>>>>>> CU DIE.
>>>>>>> We can think of EndOffset as the size of the whole CU DIE. SecOffset
>>>>>>> is the offset from the Debug Info section, so we can update it by adding the
>>>>>>> CU size.
>>>>>>>
>>>>>>>   // Offset from the beginning of debug info section.
>>>>>>>   unsigned SecOffset = 0;
>>>>>>>   for (SmallVectorImpl<CompileUnit *>::iterator I = CUs.begin(),
>>>>>>>          E = CUs.end(); I != E; ++I) {
>>>>>>>     (*I)->setDebugInfoOffset(SecOffset);
>>>>>>>     unsigned Offset =
>>>>>>>       sizeof(int32_t) + // Length of Compilation Unit Info
>>>>>>>       sizeof(int16_t) + // DWARF version number
>>>>>>>       sizeof(int32_t) + // Offset Into Abbrev. Section
>>>>>>>       sizeof(int8_t);   // Pointer Size (in bytes)
>>>>>>>
>>>>>>>     unsigned EndOffset = computeSizeAndOffset((*I)->getCUDie(),
>>>>>>> Offset);
>>>>>>>     SecOffset += EndOffset;
>>>>>>>   }
>>>>>>
>>>>>>
>>>>>> Added more comments in attached patch.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Eric/Manman: rough design question: compute the absolute offset of
>>>>>>>> each CU within the debug_info section and describe them all relative to a
>>>>>>>> single symbol at the start of the debug_info section, or should we put a
>>>>>>>> label at the start of each CU?
>>>>>>>
>>>>>>>
>>>>>>> Either way should work. But since we already have the section offset
>>>>>>> for each CU, describing relative to the start of debug_info section saves us
>>>>>>> a few symbols :)
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 0002:
>>>>>>>>   ref_addr_relocation.ll:
>>>>>>>>     seems a bit vague in terms of how you test for the relocation. I
>>>>>>>> think it'd make more sense to test the assembly, than the reafobj output,
>>>>>>>> that way you can test that the correct bytes have the relocation rather than
>>>>>>>> just that there's "some" .debug_info relocation in the file. For an example,
>>>>>>>> see test/DebugInfo/X86/tls.ll I wrote - it has some "unannotated" bytes
>>>>>>>> because we still don't have a nice way to annotate location bytes that are
>>>>>>>> DWARF expressions, but it's close - I guess those should be CHECK-NEXTs,
>>>>>>>> though. In any case, you should be able to check a few lines of assembly
>>>>>>>> with the # DW_AT/DW_TAG annotation comments.
>>>>>>>
>>>>>>>
>>>>>>> I can check for ".quad .Lsection_info+38 #DW_AT_type".
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>     You'd need to add the tu3.cpp from my example if you want to
>>>>>>>> demonstrate that the relocation is actually working as intended and avoiding
>>>>>>>> the bogus result I showed.
>>>>>>>>   type-unique-simple-a.ll
>>>>>>>>     While I agree that having common test cases helps reduce the
>>>>>>>> number of separate invocations we have, this test case seems like it might
>>>>>>>> be becoming a little hard to follow what's under test. Just going from the
>>>>>>>> diff I'm not sure what's what. Could you give a brief description of the
>>>>>>>> state of type-unique-simple2 files? What's involved? What's it meant to
>>>>>>>> test? What's it actually testing?
>>>>>>>
>>>>>>> I can add more comments. The source files are included in the testing
>>>>>>> case. I am checking that llvm-link only generates a single copy of the
>>>>>>> struct and the backend generates a single DIE and uses ref_addr.
>>>>>>> The changes are to check "the backend generates a single DIE and uses
>>>>>>> ref_addr".
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>   DIE.h:
>>>>>>>>     checkCompileUnit could probably be called
>>>>>>>> "getCompileUnitOrNull", the name "check*" seems to imply that it does some
>>>>>>>> checking, which isn't true.
>>>>>>>
>>>>>>> Will do.
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>   DwarfCompileUnit.cpp:
>>>>>>>>     the "istype || issubprogram" check should probably be pulled out
>>>>>>>> into a separate function, something like "isShareableAcrossCUs" or something
>>>>>>>> like that (please, that's not the best name, let's discuss it further before
>>>>>>>> we settle on a name) so that getDIE and insertDIE are sure to use the same
>>>>>>>> test at all times.
>>>>>>>
>>>>>>> Yes, the condition is shared between getDIE and insertDIE. I like
>>>>>>> isSharableAcrossCUs, because that is why the map is in DwarfDebug instead of
>>>>>>> CompileUnit.
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>   Why does addDIEEntry take a form? When does the caller get to
>>>>>>>> choose the form rather than the callee deciding between ref4 and ref_addr
>>>>>>>> based on context?
>>>>>>>
>>>>>>>
>>>>>>> addDIEEntry took a form before my change and I didn't check why it
>>>>>>> did.
>>>>>>> I will check if all callers always use ref4, if it it true, I will
>>>>>>> submit a cleanup patch first.
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>   I'm still unsure about this worklist thing - do your current tests
>>>>>>>> cover the need for the worklist? ie: if we removed that logic, would tests
>>>>>>>> fail? Can you describe a specific sequence where the worklist is necessary?
>>>>>>>
>>>>>>>
>>>>>>> If we are sure that DIEs are always added to an owner before calling
>>>>>>> addDIEEntry, we don't need the worklist.
>>>>>>> But I saw cases where that was not true, I will get a reduced testing
>>>>>>> case.
>>>>>>
>>>>>>
>>>>>> If we try to assert both DIEs have owner in addDIEEntry, the following
>>>>>> testing cases will fail:
>>>>>>     LLVM :: DebugInfo/X86/multiple-aranges.ll
>>>>>>     LLVM :: DebugInfo/X86/ref_addr_relocation.ll
>>>>>>     LLVM :: DebugInfo/X86/stmt-list-multiple-compile-units.ll
>>>>>>     LLVM :: DebugInfo/two-cus-from-same-file.ll
>>>>>>     LLVM :: Linker/type-unique-simple-a.ll
>>>>>>     LLVM :: Linker/type-unique-simple2.ll
>>>>>>
>>>>>> For ref_addr_relocation, it failed in
>>>>>> #5  0x0000000100b169ba in llvm::DwarfDebug::addDIEEntry
>>>>>> (this=0x103023600, Die=0x102e14090, Attribute=73, Entry=0x10302a9d0) at
>>>>>> /Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3071
>>>>>> #6  0x0000000100b040e0 in llvm::CompileUnit::addType
>>>>>> (this=0x102e13ec0, Entity=0x102e14090, Ty={<llvm::DIScope> =
>>>>>> {<llvm::DIDescriptor> = {DbgNode = 0x102e05f30}, <No data fields>}, <No data
>>>>>> fields>}, Attribute=73) at
>>>>>> /Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:910
>>>>>> #7  0x0000000100b05bff in llvm::CompileUnit::createGlobalVariableDIE
>>>>>> (this=0x102e13ec0, N=0x102e068c0) at
>>>>>> /Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1505
>>>>>>
>>>>>> If we look at DwarfCompileUnit.cpp:
>>>>>>     VariableDIE = new DIE(GV.getTag());
>>>>>>     // Add to map.
>>>>>>     insertDIE(N, VariableDIE);
>>>>>>
>>>>>>     // Add name and type.
>>>>>>     addString(VariableDIE, dwarf::DW_AT_name, GV.getDisplayName());
>>>>>>     addType(VariableDIE, GTy);
>>>>>>
>>>>>> The VariableDIE is not added to an owner yet when calling addType.
>>>>>
>>>>>
>>>>> I believe I have addressed all review comments, the patches are
>>>>> re-attached here for convenience.
>>>>
>>>>
>>>> So I'm still thinking about the work list work.
>>>>
>>>> If we don't know which CU a DIE is in - isn't it, necessarily, going to
>>>> be in the current CU (& thus referenced by ref_data4 (using a CU-local
>>>> offset), not ref_addr)?
>>>
>>>
>>> That may be true. But can we prove that?
>>
>>
>> We really shouldn't add extra complexity to the codebase just because we
>> don't know how the codebase works - that's what leads to the kind of
>> complexity we see in the debug info handling today.
>>
>>>
>>> There are two ways of handling this:
>>> 1> use a worklist to be conservative
>>> 2> do not use a worklist, but add an assertion when emitting a DIE A,
>>> make sure the DIE referenced with ref4 is indeed in the same CU as DIE A.
>>
>>
>> Please just add this assertion. If it fires we'll have a test case and a
>> concrete justification for this complexity such that should anyone remove it
>> later because it looked unnecessary, they'll at least have a failing test to
>> explain why it was there in the first place.
>>
>>>
>>>
>>> Let me know which one you prefer.
>>>
>>> Do you have any comments on the ref_addr patch (the 1st patch of the
>>> two)?
>>
>>
>> Nothing much - I wouldn't mind Eric taking a look (& would rather you not
>> commit this until you're committing the second patch, since it's otherwise
>> untested/unjustified code) on the label/offset related stuff since I'm less
>> familiar with that.
>
>
> Eric,
>
> Do you want to take a look at the 1st patch?
>

I do, I've been following the thread, but give me a day or so.

Thanks.

-eric

>>
>>
>> There's one or two cases of {} on single-statement blocks you could fix
>> up.
>
> I couldn't find that.
> Are you referring to:
>
> +      } else {
>
> +        Asm->EmitInt32(Addr);
>
> +      }
>
> It has a multi-statement if, I thought the rule is to have a parenthesis for
> the else as well.
>
>>
>>
>> "DebugInfoOffset" (both the member and the functions to set/get it) might
>> be more meaningfully named "SectionOffset" - but I'm not sure. Eric? Other
>> names (DebugInfoSectionOffset?)?
>
>
> Eric, any opinion here?
>
> Thanks,
> Manman
>>
>>
>>
>> - David
>
>



More information about the llvm-dev mailing list