[PATCH] D32821: Add DWARF verifiers to verify address ranges are correct and scoped correctly.

Mon May 8 16:03:28 PDT 2017

On Wed, May 3, 2017 at 1:57 PM Greg Clayton <clayborg at gmail.com> wrote:

> On May 3, 2017, at 1:45 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Wed, May 3, 2017 at 1:32 PM Greg Clayton via Phabricator <
> reviews at reviews.llvm.org> wrote:
>
>> clayborg added a comment.
>>
>> In https://reviews.llvm.org/D32821#745213, @dblaikie wrote:
>>
>> > I haven't followed some of the design direction Adrian's been pointing
>> this work in - so all of this is subject to discussion, etc:
>> >
>> > 1. I'd expect not to build an entire representation of all ranges
>> up-front, then verify them (uses more memory, etc) - but rather verify
>> during a traversal of the DIEs instead
>>
>>
>> The whole reason we need this is to look for overlapping functions
>> between all CUs. This helps us to identify exactly which DIEs overlap.
>
>
> If it's true that no DIEs should overlap, and all DIEs should be a subset
> of their parent - then there's no need to compare two DIEs that aren't
> siblings* because they can't overlap. The only comparisons that should be
> needed would be between immediate siblings, and siblings and parents - not
> all possible pairs of nodes. Right?
>
>
> That is how this is coded. Only top DW_TAG_subprogram DIEs get added to
> the AllFunctionRangeInfos. So when checking for overlap, we just check
> between adjacent DW_TAG_subprogram. Since these are in a std::set, they are
> ordered already. So we just check:
>
> for I in AllFunctionRangeInfos.size():
>   AllFunctionRangeInfos[I].Contains(AllFunctionRangeInfos[I+1])
>

Ah, that makes loads of sense - nice!

So why aren't the lexical blocks scopes handled the same way?

(& indeed, this is what I'm getting at about the duplicate code across this
- I'm not understanding why there are two different ways of testing
intersects or subset - and two different codepaths for the error messages
too)

What I'm picturing is a single device, much like what I wrote in the patch
I provided last week, I think. That device/class represents a scope (in the
general sense, not the lexical_scope sense), with an optional parent node +
range and a list of children node+ranges. There would be an explicit
instance for the file level (with no parent) that is used by every
compile_unit's ranges. Another (hopefully local/non-member version, passed
down - but it could be a member) created when visiting a top level compile
unit - for its subprograms. And any number more created on the stack for
the lexical scopes.

This object would handle error message printing for both sibling overlap
and not-contained-in-parent errors, in one place with one
message/handling/etc.

>
>
> * for a slightly broad definition of siblings - ignoring any nodes that
> don't have ranges (eg: A:R { B { C:R { } } D:R { } } - D and C are siblings
> in this sense)
>
>
>> It is also just a DIE (2 pointers) + vector<pair<uint64_t, uint64_t>. Not
>> too costly.
>
>
>> > 2. for/for loops to check if one range is a subset of another's
>> probably rather inefficient. Not sure if LLVM already has better tools for
>> this, though I suspect not.
>>
>> It currently works and I have seen many errors introduced by the
>> compilers when they optimize debug info and they get it wrong.
>
>
> Sure enough - I'm not suggesting the check is a bad idea.
>
>
>> We only check blocks and inline functions against the containing ranges.
>
>
> Optimized code can produce /many/ blocks & inline functions, FWIW.
>
>
>> We can make sure the ranges are coalesced when needed, but other than
>> that, what is innefficient about what is there?
>
>
> RangeInfoType::DoesIntersect is O(N^2) - if the ranges are sorted it
> should be O(N), I would think?
>
> & it looks like verifyDebugInfoTag, for example, calls CheckForRangeErrors
> against all other nodes at the same level (so, for example - if it was only
> looking at subprograms, each subprogram would be compared against all other
> subprograms that came before it - even those in other CUs, even when that
> other CU's ranges had been proven not to intersect with this CU - similarly
> one layer down, all the top level lexical_scopes would be compared against
> all other top level lexical scopes, even when the subprogram ranges for the
> two subprograms were distinct)
>
>
> No, CheckForRangeErrors check a single DIEs address ranges and looks for
> low_pc > high_pc and removes any empty ranges (optimized out function often
> just have low_pc == high_pc). So CheckForRangeErrors only look at ranges in
> one single DIE.
>
>
>
>> 99% of the time it is one range seeing if one other range is contained.
>>
>> >
>> >
>> > 3. I'm not sure that function ranges need to be special/separate - I'd
>> expect any nested ranges (eg: the range of two scopes in a function) to be
>> distinct as well, so I'd expect a more general algorithm that walks nodes
>> and their children looking for ranges and checking that children are
>> subsets of parents, and children are distinct from each other.
>>
>> Again, not sure what is wrong with the way it is doing it. We just keep a
>> stack >= depth of the current DIE with ranges filled in. If a block or
>> inlined function is found, it verifies it is in the parent. The DWARF
>> doesn't store this information in any useful way we can access, so I
>> believe this approach work well.
>>
>
> I didn't mean to suggest it won't work - I'm confused by why there is a
> separate "verifyDebugInfoOVerlappingFunctionRanges" - I would've expected
> that to fall out naturally from the DIE walk range checking.
>
>
> We don't detect overlaps until the end. And the only thing we verify on
> the fly is if the containing DIE (DW_TAG_subprogram,
> or DW_TAG_lexical_block or DW_TAG_inlined_subroutine) contains all address
> ranges for the child. So a child DW_TAG_lexical_block
> or DW_TAG_inlined_subroutine will check that its ranges are fully contained
> in the parent's ranges.
>
>
> At the top level, I could imagine since CU DIEs don't have a common
> parent, the CU range overlap check might need to be special-cased rather
> than falling out naturally from the DIE walk - but only that.
>
>
> Child DW_TAG_subprogram DIEs verify that their address ranges are
> contained in DieRangeInfos[0].
>
> So I think everything is happened pretty efficiently. I can improve both
> DWARFVerifier::RangeInfoType::Contains()
> and DWARFVerifier::RangeInfoType::DoesIntersect() to use std::lower_bound
> since we sort all ranges in CheckForRangeErrors if that is what you are
> worried about?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170508/0e257db6/attachment.html>