[PATCH] D28303: Add iterator support to DWARFDie to allow child DIE iteration.

Tue Jan 10 11:30:26 PST 2017

> On Jan 10, 2017, at 7:38 AM, David Blaikie via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> 
> 
> On Mon, Jan 9, 2017 at 5:50 PM Chris Bieneman <cbieneman at apple.com <mailto:cbieneman at apple.com>> wrote:
>> On Jan 9, 2017, at 4:00 PM, David Blaikie via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>> 
>> 
>> 
>> On Mon, Jan 9, 2017 at 3:55 PM Chris Bieneman <cbieneman at apple.com <mailto:cbieneman at apple.com>> wrote:
>>> On Jan 9, 2017, at 3:49 PM, David Blaikie via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>> 
>>> 
>>> 
>>> On Mon, Jan 9, 2017 at 3:46 PM Chris Bieneman <cbieneman at apple.com <mailto:cbieneman at apple.com>> wrote:
>>> At the moment the DWARF APIs lack of fine-grained error reporting makes the ObjectYAML tools not well suited to this.
>>> 
>>> Not sure I follow - the intent here would be to potentially use ObjectYAML (perhaps via API rather than command line?) to produce an input to a test case.
>>> 
>>> Where does the DWARF API's granularity impact ObjectYAML's suitability?
>> 
>> Ah! I think I misunderstood what you were asking. The ObjectYAML library currently doesn't contain the code for writing the byte streams, that code presently lives in the tool code rather than the library. I could move that code without much trouble, and that would probably make it pretty straight forward to generate these kinds of malformed DWARF files in memory with API calls.
>> 
>> Basically the way this would work is you would populate a DWARFYAML::Data object with the values you want written out, and you would make a series of API calls to write the DWARF data into raw_ostreams. If that is an interesting application I can probably work up a patch later this week.
>> 
>> Open to other ideas - but I expect we'll have cases that we won't be reasonably able to catch with dwarfdump or roudntripping through yaml - where we'll want to test the API directly for interesting interactions.
> 
> I agree. I'll look at building out library APIs for this. We also may be able to merge this and Greg's DWARFGenerator APIs together to reduce functional duplication.
> 
> *nod* that'd be great
>  
> 
>>  
>> 
>>>  
>>> As I mentioned in another thread related to my YAML tools, I would actually like to feed fine-grained error reporting using llvm::Error through the DWARF APIs in the future. Adding that support would allow the ObjectYAML tools to exercise cases like this either via llvm-dwarfdump, or just by round-tripping YAML->binary->YAML.
>>> 
>>> Not sure I'm following the roundtripping - oh, because you use LLVM's DWARF APIs to parse the binary then produce YAML from that. Doesn't that mean you wouldn't be able to use this to create invalid input test cases from existing binaries?
>> 
>> Yes, the existing tools can be used to create invalid input test cases from existing invalid binaries. That is part of why I like the YAML tools so much is that we have a way of capturing bad data from live sources.
>> 
>> Not sure I follow here. It sounded like you were saying you use the existing LLVM DWARF APIs to parse object files to produce YAML, yes?
> 
> The answer here is not as simple as I would like. The LLVM DWARF parser doesn't retain all the data it parses, so in some places the dwarf->yaml tool has its own limited parsing support. An example of this is in the line table, where we read from the stream in the state machine evaluation. There is no parser that maintains the state machine instructions.
> 
>> So if those APIs fail on invalid DWARF (probably not with sufficient granularity to describe the exact bits that were invalid, I should expect - that would be an unreasonable/burdensome requirement for the error handling, I would imagine) - how would they be used to produce YAML describing invalid DWARF?
> 
> What we've discussed doing for libObject (which I think would somewhat apply here) is a mechanism for filtering errors and classifying structural vs semantic errors. Generally this maps more cleanly for MachO than DWARF because MachO is largely self-describing.
> 
> The idea is that a structural error causes the parser to throw its hands up because there is nothing that can be done, and in those cases obj2yaml can't be used to generate a test case, but you may be able to hand edit a valid yaml file to generate invalid DWARF. Alternatively, semantic errors really just mean the binary data is structurally understandable, but complete nonsense. In those cases the obj2yaml will generate proper YAML.
> 
> What sort of structural errors do you have in mind in DWARF?

The simplest structural error I've encountered was running dwarfdump on an object file where some of the DWARF sections were zero'd out.

I haven't actually tracked down where that went wrong, but I know I hit an infinite loop somewhere in the DWARF parser.

I imagine we can find whole classes of other issues using fuzz testing of the DWARF parser, and I'm hopeful that I'll be able to spend some cycles on that.

-Chris

> For now I don't think we validate anything much (we don't validate the schema - that certain attributes and tags appear only in the contexts the spec suggests they should appear (the spec is fairly permissive, as they like to say, so it doesn't say you /must/ only put these attributes/tags in this arrangement)).
>  
> In the MachO YAML tools I also have fallbacks that kick in when the tool doesn't understand the data it is reading, but it does know the size. In those cases it falls back to encoding a hex array. I could add similar mechanisms to the DWARF yaml, to allow capturing partial well-formed DWARF, and hex bytes for things that the parser falls over on.
> 
> Does this make sense and answer your questions?
> 
> More or less - thanks!
> 
> - Dave
>  
> 
> -Chris
> 
>> 
>> - David
>>  
>> 
>> -Chris
>> 
>>>  
>>> Similar to discussions that have revolved around libObject, if we had fine-grained error reporting and API calls to "validate" the data, we could build an llvm-objvalidate tool that could be used to test these kinds of conditions using the YAML-encoded object files. That would be a similar solution to what David suggested with having all parsing failures testable with llvm-dwarfdump, and I think that is a good goal to strive for.
>>> 
>>> Today we have woefully insufficient testing of failure cases in the DWARF parser, and I've encountered some odd edge cases (crashes and infinite loops) with malformed DWARF files during my development of the YAML tools which I hope to work on fixing.
>>> 
>>> 
>>> -Chris
>>>> On Jan 5, 2017, at 2:57 PM, David Blaikie via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>> 
>>> 
>>>> 
>>>> 
>>>> On Thu, Jan 5, 2017 at 2:35 PM Greg Clayton via Phabricator <reviews at reviews.llvm.org <mailto:reviews at reviews.llvm.org>> wrote:
>>>> clayborg added inline comments.
>>>> 
>>>> 
>>>> ================
>>>> Comment at: unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp:1228
>>>> +  // the DWARFDie::end() iterator.
>>>> +  EXPECT_EQ(DWARFDie::iterator(Null), Null.end());
>>>> +}
>>>> ----------------
>>>> It isn't possible with the DWARF APIs currently without adding code that no one would want.
>>>> 
>>>> I was able to use DwarfGenerator to generate a DWARF file that was close that I saved to disk. I then used obj2yaml to make a yaml file, edited it and got what we need:
>>>> ```
>>>> --- !mach-o
>>>> FileHeader:
>>>>   magic:           0xFEEDFACF
>>>>   cputype:         0x01000007
>>>>   cpusubtype:      0x00000003
>>>>   filetype:        0x00000001
>>>>   ncmds:           2
>>>>   sizeofcmds:      248
>>>>   flags:           0x00000000
>>>>   reserved:        0x00000000
>>>> LoadCommands:
>>>>   - cmd:             LC_SEGMENT_64
>>>>     cmdsize:         232
>>>>     segname:         ''
>>>>     vmaddr:          0
>>>>     vmsize:          27
>>>>     fileoff:         280
>>>>     filesize:        27
>>>>     maxprot:         7
>>>>     initprot:        7
>>>>     nsects:          2
>>>>     flags:           0
>>>>     Sections:
>>>>       - sectname:        __debug_abbrev
>>>>         segname:         __DWARF
>>>>         addr:            0x0000000000000000
>>>>         size:            5
>>>>         offset:          0x00000118
>>>>         align:           0
>>>>         reloff:          0x00000000
>>>>         nreloc:          0
>>>>         flags:           0x02000000
>>>>         reserved1:       0x00000000
>>>>         reserved2:       0x00000000
>>>>         reserved3:       0x00000000
>>>>       - sectname:        __debug_info
>>>>         segname:         __DWARF
>>>>         addr:            0x000000000000000D
>>>>         size:            14
>>>>         offset:          0x00000125
>>>>         align:           0
>>>>         reloff:          0x00000000
>>>>         nreloc:          0
>>>>         flags:           0x02000000
>>>>         reserved1:       0x00000000
>>>>         reserved2:       0x00000000
>>>>         reserved3:       0x00000000
>>>>   - cmd:             LC_VERSION_MIN_MACOSX
>>>>     cmdsize:         16
>>>>     version:         1048576
>>>>     sdk:             0
>>>> DWARF:
>>>>   debug_abbrev:
>>>>     - Code:            0x00000001
>>>>       Tag:             DW_TAG_compile_unit
>>>>       Children:        DW_CHILDREN_yes
>>>>       Attributes:
>>>>   debug_info:
>>>>     - Length:          10
>>>>       Version:         4
>>>>       AbbrOffset:      0
>>>>       AddrSize:        8
>>>>       Entries:
>>>>         - AbbrCode:        0x00000001
>>>>           Values:
>>>>         - AbbrCode:        0x00000000
>>>>           Values:
>>>> ...
>>>> ```
>>>> 
>>>> Are you OK if I have this text in a "const char *" variable and use the yaml2obj APIs to create an in memory file and then parse that DWARF and then test this? I can't really use FileCheck to test this internal iteration API. Let me know if you are ok with this approach?
>>>> 
>>>> Chris - any ideas here? This seems like a canonical example of the sort of test coverage we want and neither of the directions being pursued seem to be covering it.
>>>> 
>>>> Are there avenues in either that would help here?
>>>> 
>>>> Honestly I'd be OK expanding LLVM's DWARF generation APIs to support this - but eventually for these parser tests we'll want to expand them to cover invalid DWARF which will be harder to justify/express there, perhaps.
>>>> 
>>>> - Dave
>>>>  
>>>> 
>>>> 
>>>> https://reviews.llvm.org/D28303 <https://reviews.llvm.org/D28303>
>>>> 
>>>> 
>>>> 
>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>_______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170110/e35df309/attachment.html>