[PATCH] D110363: [DWARF][NFC] add ParentIdx and SiblingIdx to DWARFDebugInfoEntry for faster navigation.

Tue Sep 28 12:53:03 PDT 2021

dblaikie added a comment.

In D110363#3028338 <https://reviews.llvm.org/D110363#3028338>, @avl wrote:

>>>   The assertions can't be done, that is true, but the theory is if we parse all of the DIEs correctly, they should all have valid values for the ParentIdx and SiblingIdx up front and we should have nothing to worry about. We should create the data correctly one time and then trust it. We have been using this in the LLDB parser for many years and it is quite stable and efficient.
>
>
>
>> Yep, +1 to that. If these offsets were based on parsed input, then there'd be a possibility of them being incorrect/invalid/out of range, but they aren't - they'd be computed based on the hierarchy created in memory by libDebugInfo.
>
> Agreed that data should be created correctly and then no need to insert *run time* checks for them. But things looks differently for the assertions - that is a purpose of assertions to prove things which should be correct. In such case if any error occurred - the assertion will show it. If some incorrect memory access from new other code would overwrite ParentIdx then assertion will show it.
>
> Other then assertions there are cases when we still need run-time checks for array boundaries, even for correctly parsed DWARF:
>
>     DWARFDie DWARFUnit::getFirstChild(const DWARFDebugInfoEntry *Die) {
>     if (!Die->hasChildren())
>       return DWARFDie();
>   
>     // We do not want access out of bounds when parsing corrupted debug data.
>     size_t I = getDIEIndex(Die) + 1;
>     if (I >= DieArray.size())    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>       return DWARFDie();
>     return DWARFDie(this, &DieArray[I]);
>   }

Yeah, we could classify the data as invalid sooner - like we wouldn't add a partial DIE (one where its abbreviation-defined size is larger than the remaining buffer space to read from) we perhaps could consider a DIE that says it has children but there's no space for children or no null child DIE to define the end of the list - as invalid too.

>   DWARFDie DWARFUnit::getPreviousSibling(const DWARFDebugInfoEntry *Die) {
>     if (!Die)
>       return DWARFDie();
>   
>     ...
>   
>     // Find the previous DIE whose parent is the same as the Die's parent.
>     for (size_t I = getDIEIndex(Die); I > 0;) {   <<<<<<<<<< We need to know DieArray boundary to properly search for the previous sibling.
>       --I;
>       
>
> In short, I think having assertions is useful, but if you think they are not needed then I am OK to removing them. Without assertions we still have a cases when array size should be known for proper navigation. Thus it looks correct to leave this navigation API(getParent, GetSibling, GetPrevSibling, GetFirstChild, GetLastChild) inside DWARFUnit(Since DWARFDebugInfoEntry does not know about DieArray).

What does LLDB use here? Does it not offer a 'previous sibling' API?

(though, honestly - if some operations require the underlying array anyway - I think using absolute indexes is probably reasonable unless there's some compelling performance issues)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110363/new/

https://reviews.llvm.org/D110363