[PATCH] D32821: Add DWARF verifiers to verify address ranges are correct and scoped correctly.

Wed May 3 13:57:43 PDT 2017

> On May 3, 2017, at 1:45 PM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> 
> 
> On Wed, May 3, 2017 at 1:32 PM Greg Clayton via Phabricator <reviews at reviews.llvm.org <mailto:reviews at reviews.llvm.org>> wrote:
> clayborg added a comment.
> 
> In https://reviews.llvm.org/D32821#745213 <https://reviews.llvm.org/D32821#745213>, @dblaikie wrote:
> 
> > I haven't followed some of the design direction Adrian's been pointing this work in - so all of this is subject to discussion, etc:
> >
> > 1. I'd expect not to build an entire representation of all ranges up-front, then verify them (uses more memory, etc) - but rather verify during a traversal of the DIEs instead
> 
> 
> The whole reason we need this is to look for overlapping functions between all CUs. This helps us to identify exactly which DIEs overlap.
> 
> If it's true that no DIEs should overlap, and all DIEs should be a subset of their parent - then there's no need to compare two DIEs that aren't siblings* because they can't overlap. The only comparisons that should be needed would be between immediate siblings, and siblings and parents - not all possible pairs of nodes. Right?

That is how this is coded. Only top DW_TAG_subprogram DIEs get added to the AllFunctionRangeInfos. So when checking for overlap, we just check between adjacent DW_TAG_subprogram. Since these are in a std::set, they are ordered already. So we just check:

for I in AllFunctionRangeInfos.size():
  AllFunctionRangeInfos[I].Contains(AllFunctionRangeInfos[I+1])

> 
> * for a slightly broad definition of siblings - ignoring any nodes that don't have ranges (eg: A:R { B { C:R { } } D:R { } } - D and C are siblings in this sense)
>  
> It is also just a DIE (2 pointers) + vector<pair<uint64_t, uint64_t>. Not too costly.
> 
> > 2. for/for loops to check if one range is a subset of another's probably rather inefficient. Not sure if LLVM already has better tools for this, though I suspect not.
> 
> It currently works and I have seen many errors introduced by the compilers when they optimize debug info and they get it wrong.
> 
> Sure enough - I'm not suggesting the check is a bad idea.
>  
> We only check blocks and inline functions against the containing ranges.
> 
> Optimized code can produce /many/ blocks & inline functions, FWIW.
>  
> We can make sure the ranges are coalesced when needed, but other than that, what is innefficient about what is there?
> 
> RangeInfoType::DoesIntersect is O(N^2) - if the ranges are sorted it should be O(N), I would think?
> 
> & it looks like verifyDebugInfoTag, for example, calls CheckForRangeErrors against all other nodes at the same level (so, for example - if it was only looking at subprograms, each subprogram would be compared against all other subprograms that came before it - even those in other CUs, even when that other CU's ranges had been proven not to intersect with this CU - similarly one layer down, all the top level lexical_scopes would be compared against all other top level lexical scopes, even when the subprogram ranges for the two subprograms were distinct)

No, CheckForRangeErrors check a single DIEs address ranges and looks for low_pc > high_pc and removes any empty ranges (optimized out function often just have low_pc == high_pc). So CheckForRangeErrors only look at ranges in one single DIE.

>  
> 99% of the time it is one range seeing if one other range is contained.
> 
> >
> >
> > 3. I'm not sure that function ranges need to be special/separate - I'd expect any nested ranges (eg: the range of two scopes in a function) to be distinct as well, so I'd expect a more general algorithm that walks nodes and their children looking for ranges and checking that children are subsets of parents, and children are distinct from each other.
> 
> Again, not sure what is wrong with the way it is doing it. We just keep a stack >= depth of the current DIE with ranges filled in. If a block or inlined function is found, it verifies it is in the parent. The DWARF doesn't store this information in any useful way we can access, so I believe this approach work well.
> 
> I didn't mean to suggest it won't work - I'm confused by why there is a separate "verifyDebugInfoOVerlappingFunctionRanges" - I would've expected that to fall out naturally from the DIE walk range checking.

We don't detect overlaps until the end. And the only thing we verify on the fly is if the containing DIE (DW_TAG_subprogram, or DW_TAG_lexical_block or DW_TAG_inlined_subroutine) contains all address ranges for the child. So a child DW_TAG_lexical_block or DW_TAG_inlined_subroutine will check that its ranges are fully contained in the parent's ranges.

> 
> At the top level, I could imagine since CU DIEs don't have a common parent, the CU range overlap check might need to be special-cased rather than falling out naturally from the DIE walk - but only that.

Child DW_TAG_subprogram DIEs verify that their address ranges are contained in DieRangeInfos[0]. 

So I think everything is happened pretty efficiently. I can improve both DWARFVerifier::RangeInfoType::Contains() and DWARFVerifier::RangeInfoType::DoesIntersect() to use std::lower_bound since we sort all ranges in CheckForRangeErrors if that is what you are worried about?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170503/573dd98c/attachment.html>