[cfe-dev] [RFC] New ClangDebuggerSupport Library

Tue Dec 13 10:41:07 PST 2016

On Tue, Dec 13, 2016 at 10:34 AM Chris Bieneman <cbieneman at apple.com> wrote:

> On Dec 13, 2016, at 8:56 AM, Greg Clayton <clayborg at gmail.com> wrote:
>
>
> On Dec 12, 2016, at 5:58 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>
>
> On Dec 12, 2016, at 5:41 PM, Greg Clayton <clayborg at gmail.com> wrote:
>
> I made the dwarfgen::Generator because it can generate DWARF in memory
> without using files, but can save to ELF file if needed. This allows you to
> test DWARF APIs in gtests. Prior to this, we had only text based dumping
> using llvm-dwarfdump and FileCheck, so if the llvm-dwarfdump llvm tests
> were working, that was considered enough to make sure nothing else was
> broken.
>
> Most of the stuff we are that is done in YAML can also be done in the
> classes in the dwarfgen namespace except you would need manually write the
> dwarfgen code as there is no current way to say "take this DWARF file and
> make a blob of text that I can use in a test case", but an intermediate
> representation can easily be made, and that format could easily be YAML.
> The YAML files are mostly for use in the llvm-dwarfdump + FileCheck arena,
> and the dwarfgen is currently for native llvm code in gtest use cases since
> we can actually test the DWARF APIs, not just text output.
>
> One major different in the YAML format currently is that it is designed to
> serialize from a binary and deserialize in exactly the way the binary
> existed in DWARF. Take a DWARF file, generate the YAML from it and then use
> that later.
>
> The dwarfgen classes are designed to be a "create DWARF the way the user
> would want to create it and then generate me the DWARF blob I want".
>
> The YAML tools are going to be laid out just like the binary format:
>
> .debug_abbrev[0] code = 1, tag = DW_TAG_compile_unit, children = True,
> attrspecs = [ {attr = DW_AT_name, form = DW_FORM_strp}, {attr =
> DW_AT_low_pc, form = DW_FORM_addr} ]
> .debug_info = CU_header(...), code = 1, "main", 0x1000
>
> I know the above example isn't YAML, but I tried to keep it simple. Now if
> you wanted to add some new DWARF to the above info, it would be hard to do.
> You might throw off an CU relative offset that follows the data you want to
> add since the offset was encoded as a number (not as a label in the YAML
> right?). So this is constructing DWARF manually by having the know the
> DWARF format. Note you write the .debug_abbrev separate and then output the
> .debug_info separate and must know the format of DWARF. Of course this is
> probably auto generated for you, so if you are always starting with a
> binary, then you are OK.
>
> In dwarfgen you write DWARF the way you want to write it with APIs that
> insulate you from the DWARF format a bit more:
>
> dwarfgen::Generator DG;
> dwarfgen::CompileUnit CU = DG.addCompileUnit();
> dwarfgen::DIE &CUDie = CU.getUnitDIE();
> CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "main");
> CUDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000);
>
> The generator takes care of then generating the DWARF (making any
> abbreviations in .debug_abbrev it needs to and emitting the .debug_info)
> and the experience allows you to not worry how DWARF is generated, just the
> content you want to put into it.
>
> This format would be very easy to serialize and I would be happy to add
> code that allows a dwarfgen::Serializer class that could take a DWARF file
> and serialized it, but the format wouldn't be in the data centric model
> like YAML, but in the user centric format of the dwarfgen classes. Some
> YAML pseudo code would be something like:
>
> { CU_header(...), dies=
> [ DW_TAG_compile_unit, DW_CHILDREN_yes, attributes = [
>    {attr = DW_AT_name, form = DW_FORM_strp, value = "main"},
>    {attr = DW_AT_low_pc, form = DW_FORM_addr, value = 0x1000}
>  ]
> ]
> }
>
> Note the data here would be easy to modify and adding an attribute would
> be easier as it is all in one place and you can just add a line of code.
> With YAML you would need to add the attribute + form to .debug_abbrev and
> then add the new value in the .debug_info manually which leaves rooms for
> error. This only is an issue if you want to actually try and modify the
> YAML manually. If this won't ever happen, then this doesn't matter.
>
> So we could take a GCC binary and serialized it and put the text into a
> global variable that can be loaded in a gtest binary very easily using
> either method.
>
> The main question in my mind is: do we want to test the DWARF conversion
> to clang AST by comparing text files, or by using APIs. I would vote for
> the APIs as you compile code and have the clang::ASTContext in memory, then
> save the DWARF and load it, convert the types into another
> clang::ASTContext in memory and then compare two types, one from each
> clang::ASTContext. Or you can serialize one AST context and serialize
> another and make sure they are exactly the same. The latter seems like more
> work as I am not sure if we will always end up with an perfectly matched
> type and having AST comparisons in C++ code can help us work around any
> differences.
>
>
> Either I don't understand what you're saying here, or it doesn't actually
> solve the problem I'm trying to solve.
>
> When you say "compile code and have the clang::ASTContext in memory" do
> you mean take C/C++ source and convert it to an AST? Then write the DWARF,
> read the DWARF, and generate another AST from the DWARF to compare?
>
>
> Yep, that is what I am saying. This of course would work for testing the
> currently built clang since that would be needed in order to make the AST
> from source and generate the binary which can then be loaded by the LLVM
> DWARF parser and then convert to another AST and compare. This won't work
> for a canned GCC binary since we won't have the original AST from source.
>
>
> AST<->AST testing is not sufficient for what I need to test. In order for
> the code in the ClangDebuggerSupport library to replace the code in LLDB it
> *must* work for GCC-generated DWARF. Investing time in a testing
> infrastructure that only works for the current version of Clang doesn't
> meet my needs, and isn't going to be high on my priority list.
>
>
> Chris, did you plan on just making the test contain things to look for in
> the generated AST by doing things manually in the case where we have canned
> input from other compilers? Maybe if we can expound on this it might show
> the need for the YAML solution?
>
>
> I plan to generate test cases by taking source programs, compiling them
> with Clang and other compilers, dumping the DWARF to YAML, and dumping the
> Clang AST. I can use the Clang AST (and the ASTs from other compilers if
> available) to get an idea for what to expect from the DWARF->AST generation
> step. I will then use the YAML as the input to FileCheck tests converting
> YAML->DWARF->AST.
>
>
> If that is what you mean, it doesn't solve the problem that we need
> solved. We need to go from DWARF generated by GCC, ICC, Clang, and multiple
> versions of all of the above, and convert that to a clang AST that is
> reasonable for LLDB's use.
>
>
> This can be done by serializing DWARF either from dwarfgen or from your
> Obj to YAML. Both can do it. Both will be able to generate DWARF. We can
> verify that both can generate stuff byte for byte if needed.
>
>
> The current proposed dwarfgen APIs are not designed for bit-for-bit
> identical encoding of DWARF.
>

This seems weird though - why would the AST generation tests need more
fidelity than the underlying DWARF parsing APIs they rely on?

> This makes them ill-suited for generating test cases from other compilers.
> We could add that support, however I still feel the APIs are ill-suited for
> some of the kinds of large test cases that I'm intending to produce from
> YAML.
>

Perhaps we can focus on that point. Unless there's something about the
fidelity argument I'm missing, such that there's a need to have it have
more fidelity than the underlying dwarf parsing API tests.

>
>
> Let me describe the difference between the purpose behind my
> infrastructure and Greg's in terms of existing LLVM infrastructure, using
> the MC layer tests as an example.
>
> Greg's new APIs are ideal for writing tests like what you might find in
> llvm/unittests/MC/Disassembler.cpp. Those tests initialize targets and send
> small byte streams into the disassembler then verify the outputs. Greg's
> APIs are a little more complicated than that, but generally they are
> designed around creating small bits of DWARF data, writing it to a buffer
> then reading it back.
>
> My YAML infrastructure is more suited to writing the kinds of tests you
> would find in llvm/tests/MC/Disassembler/. Those tests are text files with
> hex values that are read, converted to binary and disassembled. It is more
> well suited to large tests that would take a lot of code to generate in
> gtest format.
>
> Both approaches are about making it easy to write specific (and different)
> types of tests, and both are part of an effort we're making to improve the
> testability of LLVM and LLDB's DWARF code.
>
>
> They are both valid. My one question for the YAML stuff is why are we
> trying to encode this into YAML and not just a collection of bytes? If we
> aren't going to be able to really edit the YAML then why go through all of
> the pain to encode it as DWARF YAML if this is effectively just saying
> output these exact bytes?
>
>
> I find it useful to have the data encoded in a human readable format even
> if it isn't human editable. I am *really* bad at reading hex-encoded ULEB
> values, and DWARF's encoding is very complex (it requires a turing-complete
> state machine to parse line tables).
>

If they're not human editable, we could have a tool that generates comments
(or, indeed, the dwarfdump tool we already have generaets human readable
info from the binary data) - we wouldn't need a tool to read that format as
well.

>
>
> -Chris
>
>
> So I think we should think about how we would test the DWARF to AST
> conversion with gtest or FileCheck and then pick the easier solution.
>
>
> But I do think it is still valid to think about how we are going to
> actually test this before we pick a solution.
>
>
> I think both test formats have their benefits and certain types of tests
> will be easier in each format. Having both gives us the ability to choose
> which format we want based on what we're trying to test.
>

I'm concerned the cost (code, understanding, etc - having more tools/more
ways of doing things means more ramp-up and possibly confusion over how to
do things for everyone on the project) & want to have a clear idea of the
need for these different things for doing apparently similar/overlapping
work.

- Dave

>
> -Chris
>
>
> Greg
>
>
> On Dec 12, 2016, at 4:59 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>
>
> On Dec 12, 2016, at 4:40 PM, David Blaikie via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>
>
> On Mon, Dec 12, 2016 at 4:23 PM Chris Bieneman <cbieneman at apple.com>
> wrote:
>
> On Dec 12, 2016, at 4:13 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Mon, Dec 12, 2016 at 4:09 PM Chris Bieneman <cbieneman at apple.com>
> wrote:
> David, the two approaches address very different problems.
>
> The YAML tools are focused on a bit-for-bit identical round trip path for
> DWARF into and out of YAML. The goal with that work is to be able to
> generate a test suite from the output of many different versions of many
> different compilers. This is specifically with the goal of creating
> LIT-style tests that read DWARF and operate on it.
>
> Ah, thanks for explaining.
>
> These tests wouldn't appear in LLVM/Clang's test suite, then, right? So
> normal regression tests for the ClangDebuggerSupport library would be
> written as unit tests using Greg's DWARF-generation library?
>
>
> My goal is actually to have reduced test cases based on the YAML tools in
> the clang test suite. LLDB's use of clang APIs with dwarf generated by
> mismatched compilers is the source of many issues for the debugger, so
> having basic testing of DWARF generated by alternate compilers in Clang is
> highly desirable.
>
> Well, having DWARF that's representative of that generated by alternate
> compilers is important - and it seems like Greg's work on the unit test API
> for creating DWARF should still allow that. Seems reasonable to continue to
> enhance that to produce any DWARF we care about (since we'll need to
> generate it to test the DWARF parsing APIs - so that's a prerequisite
> before we worry about whether the ClangDebuggerSupport library can do
> something sensible with it, right?)
>
>
> I haven't dug too deep into Greg's work (although I certainly will). Where
> it makes sense I may even try and leverage his APIs in the YAML tools (as I
> have been leveraging the existing DWARF parser).
>
> In my (limited) discussions with Greg, it didn't seem like creating
> bit-for-bit identical DWARF was something his APIs were suited to.
>
> In YAML I've made the textual representation mirror the binary
> representation to a degree that the translation from YAML to binary has
> very little logic to it. As a point of context the YAML->DWARF
> implementation for dumping debug_abbrev, debug_str, and debug_aranges is
> under 100 lines of code.
>
>
>
>
> Large tests generated from other compilers on raw source I would expect to
> appear in something like the test-suite, rather than in an LLVM project's
> regression or unit test suite.
>
>
> Large tests will certainly not be included in the clang test suite. YAML
> representations of DWARF should enable us to make reduced test cases in
> many situations, and where we cannot we will put the test in an external
> suite.
>
>
> Why the need for round tripping, then? Would it be sufficient for the
> test-suite to have binaries checked in next to info about what compiler
> generated them?
>
>
> The benefit of supporting round tripping in and out of a text-based format
> is that we may be able to reduce the test cases to things that we can
> include in the Clang test suite.
>
> (& why not just have the source checked in & run a variety of buildbot
> configurations (or one meta-configuration that could enumerate a variety of
> compilers) with different host compilers to test the behavior? That's how
> GDB's test suite works (for better and worse, don't get me wrong - there
> are things that could be improved from that position))
>
>
> This is actually basically how the LLDB test suite works. There is one
> huge drawback to this. Not everyone has access to every compiler we want to
> support, and certainly most people don't have them all installed. As a
> result having source-based tests means that many people may not be able to
> reproduce test failures locally. Using YAML encodings to generate the
> binary DWARF removes the compiler from the picture, and allows everyone to
> test every compiler's output.
>
> Fair - so why YAML rather than something more like the unit tests Greg's
> working on in LLVM?
>
>
> I mostly gravitated to YAML because I have experience using YAML-based
> tests for libObject code, and have found it very useful to be able to
> translate binaries in and out of YAML for testing.
>
>
> (this is clearly my preference - to use the unit test type API, since in
> both Greg and your case, you're testing an API, not a tool, so it seems
> cool/fine/reasonable to have an API for generating the input.
>
>
> I actually expect in my use case that I'll be testing both APIs and one or
> more tools. My intention is to write a tool that reads dwarf and dumps
> Clang ASTs. For that purpose having a YAML->DWARF generator is ideal.
>
> Also for my use case YAML has an added advantage that when a user reports
> an issue I can either take a binary or YAML file from the user, and
> textually reduce that down to a test case which could live in-tree.
>
>
> But the alternative question would be: Why not test the LLVM DWARF parsing
> API Greg's testing, with this yaml input instead of the unit test API?)
>
>
> Personally, I think having both types of tests are valuable. Unit tests of
> APIs are particularly valuable for writing small-grained tests, with
> limited input sizes. When I start running down the path of constructing
> Clang ASTs from complex C++ programs the code required to generate that
> DWARF in a unit test could be substantial, and that would make it a lot
> harder to write tests.
>
> Converting a binary to a YAML file is easy, hand crafting DWARF from APIs
> might not be.
>
> -Chris
>
>
>
> -Chris
>
>
> - Dave
>
>
> -Chris
>
> On Dec 12, 2016, at 3:57 PM, David Blaikie via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> I realize work is already underway/being committed here, but figured
> discussing the following in this thread rather than on some random commit
> email might be better.
>
> We now have two ways of generating DWARF, both committed in relation to a
> similar effort to integrate LLDB better with teh rest of the LLVM project.
>
> There's this YAML effort, to help test the library that will allow the
> generation of Clang ASTs from DWARF. (currently such code resides in LLDB,
> and it's proposing to be rolled up into Clang here)
>
> Then there's Greg's effort to provide a unit test API for generating DWARF
> for unit testing LLVM's DWARF parsing APIs for use in LLDB (currently what
> LLVM has was a fork of LLDB's, and Greg's working on reconciling that,
> rolling in LLDB's post-fork features, then migrating LLDB to use the fully
> featured LLVM version)
>
> Why are these done in two different ways? They seem like really similar
> use cases - generating DWARF for the purpose of testing some (LLVM or
> Clang) API that consumes DWARF bytes.
>
> Could we resolve this in favor of one approach or the other - I'm somewhat
> partial to the API approach & writing unit tests against the
> ClangDebuggerSupport library, myself.
>
> - David
>
> On Wed, Nov 9, 2016 at 2:26 PM Chris Bieneman via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> Hello cfe-dev,
>
> I would like to propose a new Clang library for implementing functionality
> that is used by LLDB. I see this as the first step in a long process of
> refactoring the language interfaces for LLDB.
>
> The short-term goal is for this library is to be a place for us to rebuild
> functionality that exists in LLDB today and relies heavily on the
> implementation of Clang. As we rebuild the functionality we will build a
> suite of testing tools in Clang that exercise this library and more general
> Clang functionality in the same ways that LLDB will.
>
> As bits of functionality become fully implemented and tested, we will
> migrate LLDB to using the Clang implementations, allowing LLDB to remove
> its own copies. This will provide the Clang community with a higher
> confidence that changes in Clang do not break LLDB, and it will provide
> LLDB with better test coverage of the Clang functionality.
>
> The long-term goal of this library is to provide the implementation for
> what could some day become a defined debugger<->frontend interface for
> providing modularized (maybe even plugin-based) language debugging support
> in LLDB. In the distant future I could see us being able to tell people
> building new frontends that we have a defined interface they need to
> implement for the debugger, and once implemented the debugger should “Just
> Work”.
>
> The first bit of functionality that I would like to build up into the
> ClangDebuggerSupport library is materialization of Clang AST types from
> DWARF. To support this development I intend to add a new tool in Clang that
> reads DWARF types, generates a Clang AST, and prints the AST. I will also
> add DWARF support to obj2yaml and yaml2obj, so we will be able to write
> YAML LIT tests for the functionality.
>
> If people are in favor of this general approach I’ll begin working in this
> direction, and I’ll probably add the new library sometime next month.
>
> Thoughts?
> -Chris
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161213/78d732ee/attachment.html>