[cfe-dev] [RFC] New ClangDebuggerSupport Library

Tue Dec 13 10:34:53 PST 2016

> On Dec 13, 2016, at 8:56 AM, Greg Clayton <clayborg at gmail.com> wrote:
> 
>> 
>> On Dec 12, 2016, at 5:58 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>> 
>>> 
>>> On Dec 12, 2016, at 5:41 PM, Greg Clayton <clayborg at gmail.com> wrote:
>>> 
>>> I made the dwarfgen::Generator because it can generate DWARF in memory without using files, but can save to ELF file if needed. This allows you to test DWARF APIs in gtests. Prior to this, we had only text based dumping using llvm-dwarfdump and FileCheck, so if the llvm-dwarfdump llvm tests were working, that was considered enough to make sure nothing else was broken.
>>> 
>>> Most of the stuff we are that is done in YAML can also be done in the classes in the dwarfgen namespace except you would need manually write the dwarfgen code as there is no current way to say "take this DWARF file and make a blob of text that I can use in a test case", but an intermediate representation can easily be made, and that format could easily be YAML. The YAML files are mostly for use in the llvm-dwarfdump + FileCheck arena, and the dwarfgen is currently for native llvm code in gtest use cases since we can actually test the DWARF APIs, not just text output.
>>> 
>>> One major different in the YAML format currently is that it is designed to serialize from a binary and deserialize in exactly the way the binary existed in DWARF. Take a DWARF file, generate the YAML from it and then use that later.
>>> 
>>> The dwarfgen classes are designed to be a "create DWARF the way the user would want to create it and then generate me the DWARF blob I want". 
>>> 
>>> The YAML tools are going to be laid out just like the binary format:
>>> 
>>> .debug_abbrev[0] code = 1, tag = DW_TAG_compile_unit, children = True, attrspecs = [ {attr = DW_AT_name, form = DW_FORM_strp}, {attr = DW_AT_low_pc, form = DW_FORM_addr} ]
>>> .debug_info = CU_header(...), code = 1, "main", 0x1000
>>> 
>>> I know the above example isn't YAML, but I tried to keep it simple. Now if you wanted to add some new DWARF to the above info, it would be hard to do. You might throw off an CU relative offset that follows the data you want to add since the offset was encoded as a number (not as a label in the YAML right?). So this is constructing DWARF manually by having the know the DWARF format. Note you write the .debug_abbrev separate and then output the .debug_info separate and must know the format of DWARF. Of course this is probably auto generated for you, so if you are always starting with a binary, then you are OK.
>>> 
>>> In dwarfgen you write DWARF the way you want to write it with APIs that insulate you from the DWARF format a bit more:
>>> 
>>> dwarfgen::Generator DG;
>>> dwarfgen::CompileUnit CU = DG.addCompileUnit();
>>> dwarfgen::DIE &CUDie = CU.getUnitDIE();
>>> CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "main");
>>> CUDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000);
>>> 
>>> The generator takes care of then generating the DWARF (making any abbreviations in .debug_abbrev it needs to and emitting the .debug_info) and the experience allows you to not worry how DWARF is generated, just the content you want to put into it. 
>>> 
>>> This format would be very easy to serialize and I would be happy to add code that allows a dwarfgen::Serializer class that could take a DWARF file and serialized it, but the format wouldn't be in the data centric model like YAML, but in the user centric format of the dwarfgen classes. Some YAML pseudo code would be something like:
>>> 
>>> { CU_header(...), dies= 
>>> [ DW_TAG_compile_unit, DW_CHILDREN_yes, attributes = [ 
>>>    {attr = DW_AT_name, form = DW_FORM_strp, value = "main"}, 
>>>    {attr = DW_AT_low_pc, form = DW_FORM_addr, value = 0x1000} 
>>>  ]
>>> ]
>>> }
>>> 
>>> Note the data here would be easy to modify and adding an attribute would be easier as it is all in one place and you can just add a line of code. With YAML you would need to add the attribute + form to .debug_abbrev and then add the new value in the .debug_info manually which leaves rooms for error. This only is an issue if you want to actually try and modify the YAML manually. If this won't ever happen, then this doesn't matter. 
>>> 
>>> So we could take a GCC binary and serialized it and put the text into a global variable that can be loaded in a gtest binary very easily using either method.
>>> 
>>> The main question in my mind is: do we want to test the DWARF conversion to clang AST by comparing text files, or by using APIs. I would vote for the APIs as you compile code and have the clang::ASTContext in memory, then save the DWARF and load it, convert the types into another clang::ASTContext in memory and then compare two types, one from each clang::ASTContext. Or you can serialize one AST context and serialize another and make sure they are exactly the same. The latter seems like more work as I am not sure if we will always end up with an perfectly matched type and having AST comparisons in C++ code can help us work around any differences.
>> 
>> Either I don't understand what you're saying here, or it doesn't actually solve the problem I'm trying to solve.
>> 
>> When you say "compile code and have the clang::ASTContext in memory" do you mean take C/C++ source and convert it to an AST? Then write the DWARF, read the DWARF, and generate another AST from the DWARF to compare?
> 
> Yep, that is what I am saying. This of course would work for testing the currently built clang since that would be needed in order to make the AST from source and generate the binary which can then be loaded by the LLVM DWARF parser and then convert to another AST and compare. This won't work for a canned GCC binary since we won't have the original AST from source. 

AST<->AST testing is not sufficient for what I need to test. In order for the code in the ClangDebuggerSupport library to replace the code in LLDB it *must* work for GCC-generated DWARF. Investing time in a testing infrastructure that only works for the current version of Clang doesn't meet my needs, and isn't going to be high on my priority list.

> 
> Chris, did you plan on just making the test contain things to look for in the generated AST by doing things manually in the case where we have canned input from other compilers? Maybe if we can expound on this it might show the need for the YAML solution?

I plan to generate test cases by taking source programs, compiling them with Clang and other compilers, dumping the DWARF to YAML, and dumping the Clang AST. I can use the Clang AST (and the ASTs from other compilers if available) to get an idea for what to expect from the DWARF->AST generation step. I will then use the YAML as the input to FileCheck tests converting YAML->DWARF->AST.

>> 
>> If that is what you mean, it doesn't solve the problem that we need solved. We need to go from DWARF generated by GCC, ICC, Clang, and multiple versions of all of the above, and convert that to a clang AST that is reasonable for LLDB's use.
> 
> This can be done by serializing DWARF either from dwarfgen or from your Obj to YAML. Both can do it. Both will be able to generate DWARF. We can verify that both can generate stuff byte for byte if needed.

The current proposed dwarfgen APIs are not designed for bit-for-bit identical encoding of DWARF. This makes them ill-suited for generating test cases from other compilers. We could add that support, however I still feel the APIs are ill-suited for some of the kinds of large test cases that I'm intending to produce from YAML.

>> 
>> Let me describe the difference between the purpose behind my infrastructure and Greg's in terms of existing LLVM infrastructure, using the MC layer tests as an example.
>> 
>> Greg's new APIs are ideal for writing tests like what you might find in llvm/unittests/MC/Disassembler.cpp. Those tests initialize targets and send small byte streams into the disassembler then verify the outputs. Greg's APIs are a little more complicated than that, but generally they are designed around creating small bits of DWARF data, writing it to a buffer then reading it back.
>> 
>> My YAML infrastructure is more suited to writing the kinds of tests you would find in llvm/tests/MC/Disassembler/. Those tests are text files with hex values that are read, converted to binary and disassembled. It is more well suited to large tests that would take a lot of code to generate in gtest format.
>> 
>> Both approaches are about making it easy to write specific (and different) types of tests, and both are part of an effort we're making to improve the testability of LLVM and LLDB's DWARF code.
> 
> They are both valid. My one question for the YAML stuff is why are we trying to encode this into YAML and not just a collection of bytes? If we aren't going to be able to really edit the YAML then why go through all of the pain to encode it as DWARF YAML if this is effectively just saying output these exact bytes?

I find it useful to have the data encoded in a human readable format even if it isn't human editable. I am *really* bad at reading hex-encoded ULEB values, and DWARF's encoding is very complex (it requires a turing-complete state machine to parse line tables).

> 
>> -Chris
>> 
>>> 
>>> So I think we should think about how we would test the DWARF to AST conversion with gtest or FileCheck and then pick the easier solution.
> 
> But I do think it is still valid to think about how we are going to actually test this before we pick a solution. 

I think both test formats have their benefits and certain types of tests will be easier in each format. Having both gives us the ability to choose which format we want based on what we're trying to test.

-Chris

> 
>>> Greg
>>> 
>>> 
>>>> On Dec 12, 2016, at 4:59 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>>>> 
>>>>> 
>>>>> On Dec 12, 2016, at 4:40 PM, David Blaikie via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Dec 12, 2016 at 4:23 PM Chris Bieneman <cbieneman at apple.com> wrote:
>>>>>> On Dec 12, 2016, at 4:13 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Dec 12, 2016 at 4:09 PM Chris Bieneman <cbieneman at apple.com> wrote:
>>>>>> David, the two approaches address very different problems.
>>>>>> 
>>>>>> The YAML tools are focused on a bit-for-bit identical round trip path for DWARF into and out of YAML. The goal with that work is to be able to generate a test suite from the output of many different versions of many different compilers. This is specifically with the goal of creating LIT-style tests that read DWARF and operate on it.
>>>>>> 
>>>>>> Ah, thanks for explaining.
>>>>>> 
>>>>>> These tests wouldn't appear in LLVM/Clang's test suite, then, right? So normal regression tests for the ClangDebuggerSupport library would be written as unit tests using Greg's DWARF-generation library?
>>>>> 
>>>>> My goal is actually to have reduced test cases based on the YAML tools in the clang test suite. LLDB's use of clang APIs with dwarf generated by mismatched compilers is the source of many issues for the debugger, so having basic testing of DWARF generated by alternate compilers in Clang is highly desirable.
>>>>> 
>>>>> Well, having DWARF that's representative of that generated by alternate compilers is important - and it seems like Greg's work on the unit test API for creating DWARF should still allow that. Seems reasonable to continue to enhance that to produce any DWARF we care about (since we'll need to generate it to test the DWARF parsing APIs - so that's a prerequisite before we worry about whether the ClangDebuggerSupport library can do something sensible with it, right?)
>>>> 
>>>> I haven't dug too deep into Greg's work (although I certainly will). Where it makes sense I may even try and leverage his APIs in the YAML tools (as I have been leveraging the existing DWARF parser).
>>>> 
>>>> In my (limited) discussions with Greg, it didn't seem like creating bit-for-bit identical DWARF was something his APIs were suited to.
>>>> 
>>>> In YAML I've made the textual representation mirror the binary representation to a degree that the translation from YAML to binary has very little logic to it. As a point of context the YAML->DWARF implementation for dumping debug_abbrev, debug_str, and debug_aranges is under 100 lines of code.
>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Large tests generated from other compilers on raw source I would expect to appear in something like the test-suite, rather than in an LLVM project's regression or unit test suite.
>>>>> 
>>>>> Large tests will certainly not be included in the clang test suite. YAML representations of DWARF should enable us to make reduced test cases in many situations, and where we cannot we will put the test in an external suite.
>>>>> 
>>>>>> 
>>>>>> Why the need for round tripping, then? Would it be sufficient for the test-suite to have binaries checked in next to info about what compiler generated them?
>>>>> 
>>>>> The benefit of supporting round tripping in and out of a text-based format is that we may be able to reduce the test cases to things that we can include in the Clang test suite.
>>>>> 
>>>>>> (& why not just have the source checked in & run a variety of buildbot configurations (or one meta-configuration that could enumerate a variety of compilers) with different host compilers to test the behavior? That's how GDB's test suite works (for better and worse, don't get me wrong - there are things that could be improved from that position))
>>>>> 
>>>>> This is actually basically how the LLDB test suite works. There is one huge drawback to this. Not everyone has access to every compiler we want to support, and certainly most people don't have them all installed. As a result having source-based tests means that many people may not be able to reproduce test failures locally. Using YAML encodings to generate the binary DWARF removes the compiler from the picture, and allows everyone to test every compiler's output.
>>>>> 
>>>>> Fair - so why YAML rather than something more like the unit tests Greg's working on in LLVM?
>>>> 
>>>> I mostly gravitated to YAML because I have experience using YAML-based tests for libObject code, and have found it very useful to be able to translate binaries in and out of YAML for testing.
>>>> 
>>>>> 
>>>>> (this is clearly my preference - to use the unit test type API, since in both Greg and your case, you're testing an API, not a tool, so it seems cool/fine/reasonable to have an API for generating the input.
>>>> 
>>>> I actually expect in my use case that I'll be testing both APIs and one or more tools. My intention is to write a tool that reads dwarf and dumps Clang ASTs. For that purpose having a YAML->DWARF generator is ideal.
>>>> 
>>>> Also for my use case YAML has an added advantage that when a user reports an issue I can either take a binary or YAML file from the user, and textually reduce that down to a test case which could live in-tree.
>>>> 
>>>>> 
>>>>> But the alternative question would be: Why not test the LLVM DWARF parsing API Greg's testing, with this yaml input instead of the unit test API?)
>>>> 
>>>> Personally, I think having both types of tests are valuable. Unit tests of APIs are particularly valuable for writing small-grained tests, with limited input sizes. When I start running down the path of constructing Clang ASTs from complex C++ programs the code required to generate that DWARF in a unit test could be substantial, and that would make it a lot harder to write tests.
>>>> 
>>>> Converting a binary to a YAML file is easy, hand crafting DWARF from APIs might not be.
>>>> 
>>>> -Chris
>>>> 
>>>>> 
>>>>> 
>>>>> -Chris
>>>>> 
>>>>>> 
>>>>>> - Dave
>>>>>> 
>>>>>> 
>>>>>> -Chris
>>>>>> 
>>>>>>> On Dec 12, 2016, at 3:57 PM, David Blaikie via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>>>>>> 
>>>>>>> I realize work is already underway/being committed here, but figured discussing the following in this thread rather than on some random commit email might be better.
>>>>>>> 
>>>>>>> We now have two ways of generating DWARF, both committed in relation to a similar effort to integrate LLDB better with teh rest of the LLVM project.
>>>>>>> 
>>>>>>> There's this YAML effort, to help test the library that will allow the generation of Clang ASTs from DWARF. (currently such code resides in LLDB, and it's proposing to be rolled up into Clang here)
>>>>>>> 
>>>>>>> Then there's Greg's effort to provide a unit test API for generating DWARF for unit testing LLVM's DWARF parsing APIs for use in LLDB (currently what LLVM has was a fork of LLDB's, and Greg's working on reconciling that, rolling in LLDB's post-fork features, then migrating LLDB to use the fully featured LLVM version)
>>>>>>> 
>>>>>>> Why are these done in two different ways? They seem like really similar use cases - generating DWARF for the purpose of testing some (LLVM or Clang) API that consumes DWARF bytes.
>>>>>>> 
>>>>>>> Could we resolve this in favor of one approach or the other - I'm somewhat partial to the API approach & writing unit tests against the ClangDebuggerSupport library, myself.
>>>>>>> 
>>>>>>> - David
>>>>>>> 
>>>>>>> On Wed, Nov 9, 2016 at 2:26 PM Chris Bieneman via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>>>>>> Hello cfe-dev,
>>>>>>> 
>>>>>>> I would like to propose a new Clang library for implementing functionality that is used by LLDB. I see this as the first step in a long process of refactoring the language interfaces for LLDB.
>>>>>>> 
>>>>>>> The short-term goal is for this library is to be a place for us to rebuild functionality that exists in LLDB today and relies heavily on the implementation of Clang. As we rebuild the functionality we will build a suite of testing tools in Clang that exercise this library and more general Clang functionality in the same ways that LLDB will.
>>>>>>> 
>>>>>>> As bits of functionality become fully implemented and tested, we will migrate LLDB to using the Clang implementations, allowing LLDB to remove its own copies. This will provide the Clang community with a higher confidence that changes in Clang do not break LLDB, and it will provide LLDB with better test coverage of the Clang functionality.
>>>>>>> 
>>>>>>> The long-term goal of this library is to provide the implementation for what could some day become a defined debugger<->frontend interface for providing modularized (maybe even plugin-based) language debugging support in LLDB. In the distant future I could see us being able to tell people building new frontends that we have a defined interface they need to implement for the debugger, and once implemented the debugger should “Just Work”.
>>>>>>> 
>>>>>>> The first bit of functionality that I would like to build up into the ClangDebuggerSupport library is materialization of Clang AST types from DWARF. To support this development I intend to add a new tool in Clang that reads DWARF types, generates a Clang AST, and prints the AST. I will also add DWARF support to obj2yaml and yaml2obj, so we will be able to write YAML LIT tests for the functionality.
>>>>>>> 
>>>>>>> If people are in favor of this general approach I’ll begin working in this direction, and I’ll probably add the new library sometime next month.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> -Chris
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>> 
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161213/d85209d7/attachment.html>