[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

Manman Ren manman.ren at gmail.com
Tue Oct 15 13:35:57 PDT 2013


On Tue, Oct 15, 2013 at 1:31 PM, Eric Christopher <echristo at gmail.com>wrote:

> On Tue, Oct 15, 2013 at 1:29 PM, Manman Ren <manman.ren at gmail.com> wrote:
> >
> >
> >
> > On Tue, Oct 15, 2013 at 11:34 AM, David Blaikie <dblaikie at gmail.com>
> wrote:
> >>
> >>
> >>
> >>
> >> On Tue, Oct 15, 2013 at 10:51 AM, Manman Ren <manman.ren at gmail.com>
> wrote:
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Oct 15, 2013 at 10:10 AM, David Blaikie <dblaikie at gmail.com>
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Oct 15, 2013 at 10:05 AM, Manman Ren <manman.ren at gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Oct 9, 2013 at 5:22 PM, Manman Ren <manman.ren at gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Oct 9, 2013 at 1:32 PM, Manman Ren <manman.ren at gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> David,
> >>>>>>>
> >>>>>>> Thanks for reviewing!
> >>>>>>>
> >>>>>>> On Wed, Oct 9, 2013 at 11:36 AM, David Blaikie <dblaikie at gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Might be easier if these were on Phabricator, but here are some
> >>>>>>>> thoughts:
> >>>>>>>>
> >>>>>>>> 0001:
> >>>>>>>>   This patch generally, while separated for legibility, is
> untested
> >>>>>>>> & difficult to discuss in isolation.
> >>>>>>>
> >>>>>>> I agree, this patch adds the functionality but does not use it, the
> >>>>>>> 2nd patch uses ref_addr.
> >>>>>>> If you think I should merge the two and commit as a single patch,
> let
> >>>>>>> me know.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> I may need to refer to the second patch in reviewing this first
> one.
> >>>>>>>>   DwarfDebug.cpp:
> >>>>>>>>     computeSizeAndOffsets:
> >>>>>>>>       I believe this produces the wrong offset for the 3rd CU and
> >>>>>>>> onwards. computeSizeAndOffset returns the EndOffset which is
> absolute, not
> >>>>>>>> relative to the Offset passed in, so it should be assigned to
> SecOffset, not
> >>>>>>>> added to it. (eg: if you have CUs at 0, 42, and 57 - the first
> pass through
> >>>>>>>> SecOffset will be zero, then it'll be 0 + 42, then on the 3rd
> it'll be (0 +
> >>>>>>>> 42) + 57 which isn't right, it should just be 57). Please add
> more test
> >>>>>>>> coverage while fixing this issue.
> >>>>>>>
> >>>>>>>
> >>>>>>> computeSizeAndOffset takes an offset that is relative to the start
> of
> >>>>>>> the CU and returns the offset relative to the CU after laying out
> the DIE.
> >>>>>>> The initial offset before laying out the CU DIE is the header size,
> >>>>>>> EndOffset will be the offset relative to the CU after laying out
> the whole
> >>>>>>> CU DIE.
> >>>>>>> We can think of EndOffset as the size of the whole CU DIE.
> SecOffset
> >>>>>>> is the offset from the Debug Info section, so we can update it by
> adding the
> >>>>>>> CU size.
> >>>>>>>
> >>>>>>>   // Offset from the beginning of debug info section.
> >>>>>>>   unsigned SecOffset = 0;
> >>>>>>>   for (SmallVectorImpl<CompileUnit *>::iterator I = CUs.begin(),
> >>>>>>>          E = CUs.end(); I != E; ++I) {
> >>>>>>>     (*I)->setDebugInfoOffset(SecOffset);
> >>>>>>>     unsigned Offset =
> >>>>>>>       sizeof(int32_t) + // Length of Compilation Unit Info
> >>>>>>>       sizeof(int16_t) + // DWARF version number
> >>>>>>>       sizeof(int32_t) + // Offset Into Abbrev. Section
> >>>>>>>       sizeof(int8_t);   // Pointer Size (in bytes)
> >>>>>>>
> >>>>>>>     unsigned EndOffset = computeSizeAndOffset((*I)->getCUDie(),
> >>>>>>> Offset);
> >>>>>>>     SecOffset += EndOffset;
> >>>>>>>   }
> >>>>>>
> >>>>>>
> >>>>>> Added more comments in attached patch.
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Eric/Manman: rough design question: compute the absolute offset of
> >>>>>>>> each CU within the debug_info section and describe them all
> relative to a
> >>>>>>>> single symbol at the start of the debug_info section, or should
> we put a
> >>>>>>>> label at the start of each CU?
> >>>>>>>
> >>>>>>>
> >>>>>>> Either way should work. But since we already have the section
> offset
> >>>>>>> for each CU, describing relative to the start of debug_info
> section saves us
> >>>>>>> a few symbols :)
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 0002:
> >>>>>>>>   ref_addr_relocation.ll:
> >>>>>>>>     seems a bit vague in terms of how you test for the
> relocation. I
> >>>>>>>> think it'd make more sense to test the assembly, than the reafobj
> output,
> >>>>>>>> that way you can test that the correct bytes have the relocation
> rather than
> >>>>>>>> just that there's "some" .debug_info relocation in the file. For
> an example,
> >>>>>>>> see test/DebugInfo/X86/tls.ll I wrote - it has some "unannotated"
> bytes
> >>>>>>>> because we still don't have a nice way to annotate location bytes
> that are
> >>>>>>>> DWARF expressions, but it's close - I guess those should be
> CHECK-NEXTs,
> >>>>>>>> though. In any case, you should be able to check a few lines of
> assembly
> >>>>>>>> with the # DW_AT/DW_TAG annotation comments.
> >>>>>>>
> >>>>>>>
> >>>>>>> I can check for ".quad .Lsection_info+38 #DW_AT_type".
> >>>>>>
> >>>>>> Done in attached patch.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>     You'd need to add the tu3.cpp from my example if you want to
> >>>>>>>> demonstrate that the relocation is actually working as intended
> and avoiding
> >>>>>>>> the bogus result I showed.
> >>>>>>>>   type-unique-simple-a.ll
> >>>>>>>>     While I agree that having common test cases helps reduce the
> >>>>>>>> number of separate invocations we have, this test case seems like
> it might
> >>>>>>>> be becoming a little hard to follow what's under test. Just going
> from the
> >>>>>>>> diff I'm not sure what's what. Could you give a brief description
> of the
> >>>>>>>> state of type-unique-simple2 files? What's involved? What's it
> meant to
> >>>>>>>> test? What's it actually testing?
> >>>>>>>
> >>>>>>> I can add more comments. The source files are included in the
> testing
> >>>>>>> case. I am checking that llvm-link only generates a single copy of
> the
> >>>>>>> struct and the backend generates a single DIE and uses ref_addr.
> >>>>>>> The changes are to check "the backend generates a single DIE and
> uses
> >>>>>>> ref_addr".
> >>>>>>
> >>>>>> Done in attached patch.
> >>>>>>>
> >>>>>>>
> >>>>>>>>   DIE.h:
> >>>>>>>>     checkCompileUnit could probably be called
> >>>>>>>> "getCompileUnitOrNull", the name "check*" seems to imply that it
> does some
> >>>>>>>> checking, which isn't true.
> >>>>>>>
> >>>>>>> Will do.
> >>>>>>
> >>>>>> Done in attached patch.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>   DwarfCompileUnit.cpp:
> >>>>>>>>     the "istype || issubprogram" check should probably be pulled
> out
> >>>>>>>> into a separate function, something like "isShareableAcrossCUs"
> or something
> >>>>>>>> like that (please, that's not the best name, let's discuss it
> further before
> >>>>>>>> we settle on a name) so that getDIE and insertDIE are sure to use
> the same
> >>>>>>>> test at all times.
> >>>>>>>
> >>>>>>> Yes, the condition is shared between getDIE and insertDIE. I like
> >>>>>>> isSharableAcrossCUs, because that is why the map is in DwarfDebug
> instead of
> >>>>>>> CompileUnit.
> >>>>>>
> >>>>>> Done in attached patch.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>   Why does addDIEEntry take a form? When does the caller get to
> >>>>>>>> choose the form rather than the callee deciding between ref4 and
> ref_addr
> >>>>>>>> based on context?
> >>>>>>>
> >>>>>>>
> >>>>>>> addDIEEntry took a form before my change and I didn't check why it
> >>>>>>> did.
> >>>>>>> I will check if all callers always use ref4, if it it true, I will
> >>>>>>> submit a cleanup patch first.
> >>>>>>
> >>>>>> Done in attached patch.
> >>>>>>>
> >>>>>>>
> >>>>>>>>   I'm still unsure about this worklist thing - do your current
> tests
> >>>>>>>> cover the need for the worklist? ie: if we removed that logic,
> would tests
> >>>>>>>> fail? Can you describe a specific sequence where the worklist is
> necessary?
> >>>>>>>
> >>>>>>>
> >>>>>>> If we are sure that DIEs are always added to an owner before
> calling
> >>>>>>> addDIEEntry, we don't need the worklist.
> >>>>>>> But I saw cases where that was not true, I will get a reduced
> testing
> >>>>>>> case.
> >>>>>>
> >>>>>>
> >>>>>> If we try to assert both DIEs have owner in addDIEEntry, the
> following
> >>>>>> testing cases will fail:
> >>>>>>     LLVM :: DebugInfo/X86/multiple-aranges.ll
> >>>>>>     LLVM :: DebugInfo/X86/ref_addr_relocation.ll
> >>>>>>     LLVM :: DebugInfo/X86/stmt-list-multiple-compile-units.ll
> >>>>>>     LLVM :: DebugInfo/two-cus-from-same-file.ll
> >>>>>>     LLVM :: Linker/type-unique-simple-a.ll
> >>>>>>     LLVM :: Linker/type-unique-simple2.ll
> >>>>>>
> >>>>>> For ref_addr_relocation, it failed in
> >>>>>> #5  0x0000000100b169ba in llvm::DwarfDebug::addDIEEntry
> >>>>>> (this=0x103023600, Die=0x102e14090, Attribute=73,
> Entry=0x10302a9d0) at
> >>>>>>
> /Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3071
> >>>>>> #6  0x0000000100b040e0 in llvm::CompileUnit::addType
> >>>>>> (this=0x102e13ec0, Entity=0x102e14090, Ty={<llvm::DIScope> =
> >>>>>> {<llvm::DIDescriptor> = {DbgNode = 0x102e05f30}, <No data fields>},
> <No data
> >>>>>> fields>}, Attribute=73) at
> >>>>>>
> /Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:910
> >>>>>> #7  0x0000000100b05bff in llvm::CompileUnit::createGlobalVariableDIE
> >>>>>> (this=0x102e13ec0, N=0x102e068c0) at
> >>>>>>
> /Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1505
> >>>>>>
> >>>>>> If we look at DwarfCompileUnit.cpp:
> >>>>>>     VariableDIE = new DIE(GV.getTag());
> >>>>>>     // Add to map.
> >>>>>>     insertDIE(N, VariableDIE);
> >>>>>>
> >>>>>>     // Add name and type.
> >>>>>>     addString(VariableDIE, dwarf::DW_AT_name, GV.getDisplayName());
> >>>>>>     addType(VariableDIE, GTy);
> >>>>>>
> >>>>>> The VariableDIE is not added to an owner yet when calling addType.
> >>>>>
> >>>>>
> >>>>> I believe I have addressed all review comments, the patches are
> >>>>> re-attached here for convenience.
> >>>>
> >>>>
> >>>> So I'm still thinking about the work list work.
> >>>>
> >>>> If we don't know which CU a DIE is in - isn't it, necessarily, going
> to
> >>>> be in the current CU (& thus referenced by ref_data4 (using a CU-local
> >>>> offset), not ref_addr)?
> >>>
> >>>
> >>> That may be true. But can we prove that?
> >>
> >>
> >> We really shouldn't add extra complexity to the codebase just because we
> >> don't know how the codebase works - that's what leads to the kind of
> >> complexity we see in the debug info handling today.
> >>
> >>>
> >>> There are two ways of handling this:
> >>> 1> use a worklist to be conservative
> >>> 2> do not use a worklist, but add an assertion when emitting a DIE A,
> >>> make sure the DIE referenced with ref4 is indeed in the same CU as DIE
> A.
> >>
> >>
> >> Please just add this assertion. If it fires we'll have a test case and a
> >> concrete justification for this complexity such that should anyone
> remove it
> >> later because it looked unnecessary, they'll at least have a failing
> test to
> >> explain why it was there in the first place.
> >>
> >>>
> >>>
> >>> Let me know which one you prefer.
> >>>
> >>> Do you have any comments on the ref_addr patch (the 1st patch of the
> >>> two)?
> >>
> >>
> >> Nothing much - I wouldn't mind Eric taking a look (& would rather you
> not
> >> commit this until you're committing the second patch, since it's
> otherwise
> >> untested/unjustified code) on the label/offset related stuff since I'm
> less
> >> familiar with that.
> >
> >
> > Eric,
> >
> > Do you want to take a look at the 1st patch?
> >
>
> I do, I've been following the thread, but give me a day or so.
>

Sure :)

Thanks,
Manman


>
> Thanks.
>
> -eric
>
> >>
> >>
> >> There's one or two cases of {} on single-statement blocks you could fix
> >> up.
> >
> > I couldn't find that.
> > Are you referring to:
> >
> > +      } else {
> >
> > +        Asm->EmitInt32(Addr);
> >
> > +      }
> >
> > It has a multi-statement if, I thought the rule is to have a parenthesis
> for
> > the else as well.
> >
> >>
> >>
> >> "DebugInfoOffset" (both the member and the functions to set/get it)
> might
> >> be more meaningfully named "SectionOffset" - but I'm not sure. Eric?
> Other
> >> names (DebugInfoSectionOffset?)?
> >
> >
> > Eric, any opinion here?
> >
> > Thanks,
> > Manman
> >>
> >>
> >>
> >> - David
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131015/4d5fe2fe/attachment.html>


More information about the llvm-dev mailing list