[PATCH] D32821: Add DWARF verifiers to verify address ranges are correct and scoped correctly.

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Thu May 4 10:01:21 PDT 2017


On Thu, May 4, 2017 at 9:57 AM Adrian Prantl <aprantl at apple.com> wrote:

> On May 4, 2017, at 9:46 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, May 4, 2017 at 9:39 AM Greg Clayton <clayborg at gmail.com> wrote:
>
>> On May 3, 2017, at 7:38 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> For illustrative purposes I've attached a version (a patch on top of your
>> patch) of some of what I had in mind - there's a few FIXMEs in there where
>> I stumbled across untested functionality in your patch - so whichever
>> patch/direction ends up being used, there's some missing test coverage to
>> address.
>>
>> One of the big things I'd misunderstood/hadn't expected was that the
>> visit(DIE) function wasn't recursive. Though the approach I've taken could
>> be implemented without the recursion (checking the depth of nodes and
>> reducing the size of the vector as depths are traversed) I think it's
>> probably a bit easier to read this way.
>>
>> It seems to simplify things quite a bit, to my mind - and adds a few
>> missing cases in multiple directions.
>>
>>
>> I like the recursion aspect.
>>
>>
>> It looks like the original patch doesn't catch overlap between two
>> lexical scopes in the same subprogram at the same level?
>>
>>
>> Yep it doesn't. I will add that.
>>
>> Also it looks like the overlapping function code only considers pairs
>> ({1, 2}, {2, 3}, and so on) - so what about if {1, 3} overlap but neither
>> overlap 2?
>>
>>
>> DieRangeInfos[0] = DW_TAG_compile_unit [0x0000000000001000 -
>> 0x0000000000003000)
>> DieRangeInfos[1] = DW_TAG_subprogram   [0x0000000000001000 -
>> 0x0000000000002000)
>> DieRangeInfos[2] = DW_TAG_lexical_block[0x0000000000001100 -
>> 0x0000000000001200)
>> DieRangeInfos[3] = DW_TAG_subprogram   [0x0000000000003000 -
>> 0x0000000000004000)
>> DieRangeInfos[4] = DW_TAG_lexical_block[0x0000000000003100 -
>> 0x0000000000003200)
>>
>> For the example above we check if any DW_TAG_subprogram is contained
>> in DW_TAG_compile_unit if it has ranges, so it would check [0] contains
>> [1], and [0] contains [2]. There would be no need to check if [1] or [2]
>> contain [3] since it assumed to be its own function. Granted I haven't seen
>> DWARF like this. But the stack still would easily verify all lexical blocks
>> and inlined subroutines correctly , except the overlap between sibling
>> blocks/inlined functions.
>>
>> Each DW_TAG_subprogram would check any contained DW_TAG_lexical_block or
>> DW_TAG_inlined_subroutine ([1] contains [2], [3] contains [4]). The patch
>> assumes if for some reason a DW_TAG subprogram is contained within anything
>> else that is is stand alone.
>>
>
> Adrian (Paul?) any thoughts on this? I know it's perhaps overly
> simplistic, but I like the idea of implementing the "if this DIE has
> ranges, those ranges must be a subrange of the nearest parent with ranges
> and not overlap with any other children of that parent" rule.
>
> This is perhaps overly restrictive if you allow for a subprogram defined
> within another subprogram (where the inner one isn't strictly part of the
> address range of the outer?)
>
>
> Wouldn't this break for Pascal-style inner functions, where the inner
> function's DW_TAG_subprogram is a child of the outer one's, but the address
> ranges are entirely non-overlapping?
>

That's the question - do you have an example of DWARF there?

I mean C++ has nested functions (lambdas, blocks, etc) but the DWARF that
is usually produced doesn't actually involved nested definition subprograms
- I've always seen the function declared locally, but defined elsewhere.


>
> -- adrian
>
> and overly permissive in that it would allow a lexical_block without
> ranges to be the parent to a lexical_block with ranges. (though perhaps
> it's better to catch that with a separate pass about which blocks should
> and shouldn't have ranges)
>
>
>> I'll come up with a hybrid of the two and repost.
>>
>>
>> (& I also left some code that would catch the case where ranges within a
>> single range list would overlap - perhaps not wrong, but seems a bit
>> broken?)
>>
>>
>>
>> On Wed, May 3, 2017 at 2:17 PM Greg Clayton <clayborg at gmail.com> wrote:
>>
>>> There should not have been a DW_AT_low_pc on the class type...
>>>
>>> On May 3, 2017, at 2:15 PM, Greg Clayton <clayborg at gmail.com> wrote:
>>>
>>>
>>> On May 3, 2017, at 1:47 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, May 3, 2017 at 1:43 PM Greg Clayton via Phabricator <
>>> reviews at reviews.llvm.org> wrote:
>>>
>>>> clayborg added a comment.
>>>>
>>>> One thing to clarify:
>>>>
>>>>   std::vector<RangeInfoType> DieRangeInfos;
>>>>
>>>> is a vector that only gets as deep as the DIE depth. It doesn't get
>>>> every range for every function, block, or inlined function, it just has a
>>>> stack of ranges for the _current_ DIE depth.
>>>>
>>>
>>> Which is part of what I'm confused by - I wouldn't expect to see/don't
>>> understand why there's any code that special-cases subprograms here. They
>>> don't seem special to me.
>>>
>>>
>>> They are special because they are top level functions. DW_TAG_subprogram
>>> appear at any depth and their parent isn't always a DW_TAG_compile_unit. So
>>> we might have DWARF like:
>>>
>>> 0x0000000b: DW_TAG_compile_unit [1] *
>>>              DW_AT_low_pc( 0x0000000000001000 )
>>>              DW_AT_high_pc( 0x0000000000003000 )
>>>              DW_AT_name( "/tmp/main.c" )
>>>
>>> 0x00000020:     DW_TAG_subprogram [2] *
>>>                  DW_AT_name( "main" )
>>>                  DW_AT_low_pc( 0x0000000000001000 )
>>>                  DW_AT_high_pc( 0x0000000000002000 )
>>>
>>> 0x00000035:         DW_TAG_lexical_block [3]
>>>                      DW_AT_low_pc( 0x0000000000001100 )
>>>                      DW_AT_high_pc( 0x0000000000001200 )
>>>
>>> 0x00000046:         NULL
>>>
>>> 0x00000047:     DW_TAG_class_type [4] *
>>>                  DW_AT_low_pc( 0x0000000000001000 )
>>>                  DW_AT_name( "Foo" )
>>>
>>> 0x00000054:         DW_TAG_subprogram [2] *
>>>                      DW_AT_name( "Bar" )
>>>                      DW_AT_low_pc( 0x0000000000002000 )
>>>                      DW_AT_high_pc( 0x0000000000003000 )
>>>
>>> 0x00000069:             DW_TAG_lexical_block [3]
>>>                          DW_AT_low_pc( 0x0000000000002100 )
>>>                          DW_AT_high_pc( 0x0000000000002200 )
>>>
>>> 0x0000007a:             NULL
>>>
>>> 0x0000007b:         NULL
>>>
>>> 0x0000007c:     NULL
>>>
>>> When visiting the DW_TAG_subprogram at 0x00000020 we would do:
>>>
>>> Depth = Die.getDepth(); // depth will be 1
>>> if (!DieRangeInfos[0].Ranges.empty())
>>>   assert(DieRangeInfos[0].Contains(DieRangeInfos[Depth].Ranges)
>>>
>>> since 0x00000020 is at depth 1
>>>
>>> Our DieRangeInfos stack would be:
>>>
>>> DieRangeInfos[0] = DW_TAG_compile_unit [0x0000000000001000 -
>>> 0x0000000000003000)
>>> DieRangeInfos[1] = DW_TAG_subprogram   [0x0000000000001000 -
>>> 0x0000000000002000)
>>>
>>> When visiting the DW_TAG_lexical_block at 0x00000035 we would only check:
>>>
>>> assert(DieRangeInfos[Depth-1].Contains(DieRangeInfos[Depth].Ranges))
>>>
>>> Our DieRangeInfos stack would be:
>>>
>>> DieRangeInfos[0] = DW_TAG_compile_unit [0x0000000000001000 -
>>> 0x0000000000003000)
>>> DieRangeInfos[1] = DW_TAG_subprogram   [0x0000000000001000 -
>>> 0x0000000000002000)
>>> DieRangeInfos[2]
>>> = DW_TAG_lexical_block[0x0000000000001100 - 0x0000000000001200)
>>>
>>> When visiting 0x00000054 we would again check against the compile units
>>> ranges:
>>>
>>> Depth = Die.getDepth(); // depth will be 2
>>> if (!DieRangeInfos[0].Ranges.empty())
>>>   assert(DieRangeInfos[0].Contains(DieRangeInfos[Depth].Ranges)
>>>
>>> DieRangeInfos[0] = DW_TAG_compile_unit [0x0000000000001000 -
>>> 0x0000000000003000)
>>> DieRangeInfos[1] = DW_TAG_class_type
>>> DieRangeInfos[2] = DW_TAG_subprogram
>>> [0x0000000000002000 - 0x0000000000003000)
>>>
>>> Then visiting 0x00000069 we do:
>>>
>>> assert(DieRangeInfos[Depth-1].Contains(DieRangeInfos[Depth].Ranges))
>>>
>>> DieRangeInfos[0] = DW_TAG_compile_unit [0x0000000000001000 -
>>> 0x0000000000003000)
>>> DieRangeInfos[1] = DW_TAG_class_type
>>> DieRangeInfos[2] = DW_TAG_subprogram
>>> [0x0000000000002000 - 0x0000000000003000)
>>> DieRangeInfos[3]
>>> = DW_TAG_lexical_block[0x0000000000002100 - 0x0000000000002200)
>>>
>>>
>>> I'd expect an algorithm that starts at the CU DIE and looks for any DIE
>>> that has ranges. It wouldn't need special cases for any tag types.
>>>
>>>
>>> You only need to verify your range against something your parent. For
>>> some, like a DW_TAG_subprogram, it must be in the DW_TAG_compile_unit only
>>> if it has ranges. For DW_TAG_lexical_block and DW_TAG_inlined_subroutine,
>>> they MUST be in their parents if their ranges are valid.
>>>
>>> Fair point - I was trying to think of if there were any particular
>> scopes inside a function that could not have ranges, guess not. I could see
>> adding that as a direct case in my example ("if tag type is
>> <CU/lexical_block/inlined_subroutine> it must have ranges" - actually,
>> though, now I reflect on that, that isn't true - abstract origins will have
>> lexical blocks but they won't have any ranges on them)
>>
>>>
>>> It is also possible in other languages to have functions declared inside
>>> of functions.
>>>
>>>
>> Do you have some concrete examples of this? Would be good to see wat sort
>> of DWARF existing providers/consumers are using here.
>>
>>
>>> In that case you want to only verify against your parent and stop at the
>>> DW_TAG_subprogram level. You could have a stack like:
>>>
>>>
>>> DieRangeInfos[0] = DW_TAG_compile_unit [0x0000000000001000 -
>>> 0x0000000000003000)
>>> DieRangeInfos[1] = DW_TAG_subprogram   [0x0000000000001000 -
>>> 0x0000000000002000)
>>> DieRangeInfos[2]
>>> = DW_TAG_lexical_block[0x0000000000001100 - 0x0000000000001200)
>>> DieRangeInfos[3] = DW_TAG_subprogram   [0x0000000000003000 -
>>> 0x0000000000004000)
>>> DieRangeInfos[4]
>>> = DW_TAG_lexical_block[0x0000000000003100 - 0x0000000000003200)
>>>
>>> You would only want to see if:
>>> - [4] is in [3]
>>> - [2] is in [1]
>>> - [3] and [1] are in [0]
>>>
>>> Make sense?
>>>
>>>
>> At least the DWARF I've seen for cases like this would have the function
>> declared locally inside the outer function, but the DWARF for the
>> definition would be separate/outside. I'd be curious to see a
>> producer/consumer using this nested idiom. Do you have an example?
>>
>>
>> <die_verification.diff>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170504/3665a157/attachment-0001.html>


More information about the llvm-commits mailing list