[llvm-dev] DWARF Generator

Greg Clayton via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 18 08:43:05 PST 2016

> On Nov 17, 2016, at 5:40 PM, Robinson, Paul <paul.robinson at sony.com> wrote:
>> -----Original Message-----
>> From: Greg Clayton [mailto:gclayton at apple.com]
>> Sent: Thursday, November 17, 2016 5:01 PM
>> To: David Blaikie
>> Cc: llvm-dev at lists.llvm.org; Robinson, Paul; Eric Christopher; Adrian
>> Prantl
>> Subject: Re: [llvm-dev] DWARF Generator
>>> On Nov 17, 2016, at 3:40 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>> On Thu, Nov 17, 2016 at 3:12 PM Greg Clayton via llvm-dev <llvm-
>> dev at lists.llvm.org> wrote:
>>> I have recently been modifying the DWARF parser and have more patches
>> planned and I want to be able to add unit tests that test the internal
>> llvm DWARF APIs to ensure they continue to work and also validate the
>> changes that I am making. There are not many DWARF unit tests other than
>> very simple ones that test DWARF forms currently. I would like to expand
>> this to include many more tests.
>>> I had submitted a patch that I aborted as it was too large. One of the
>> issues with the patch was a stand alone DWARF generator that can turn a
>> few API calls into the section data required for the DWARFContextInMemory
>> class to be able to load DWARF from. The idea is to generate a small blurb
>> of DWARF, parse it using our built in DWARF parser and validate that the
>> API calls we do when consuming the DWARF match what we expect. The
>> original stand along DWARF generator class is in
>> unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached.
>> The original review suggested that I try to use the AsmPrinter and many of
>> its associated classes to generate the DWARF. I attempted to do so and the
>> AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch
>> attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.
>>> I am having trouble getting things to work with the AsmPrinter. I was
>> able to get simple DWARF to be emitted with the AsmPrinter version of the
>> DWARF generator with code like:
>>>   initLLVM();
>>>   DwarfGen DG;
>>>   Triple Triple("x86_64--");
>>>   StringRef Path("/tmp/test.elf");
>>>   bool DwarfInitSuccess = DG.init(Triple, Path);
>>>   EXPECT_TRUE(DwarfInitSuccess);
>>>   uint16_t Version = 4;
>>>   uint8_t AddrSize = 8;
>>>   DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
>>>   DwarfGenDIE CUDie = CU.getUnitDIE();
>>>   CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c");
>>>   CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);
>>>   DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
>>>   SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, "main");
>>>   SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
>>>   SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);
>>>   DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
>>>   IntDie.addAttribute(DW_AT_name, DW_FORM_strp, "int");
>>>   IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
>>>   IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);
>>>   DwarfGenDIE ArgcDie =
>> SubprogramDie.addChild(DW_TAG_formal_parameter);
>>>   ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, "argc");
>>>   //ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); //
>> Crashes here...
>>>   DG.generate();
>>>   auto Obj = object::ObjectFile::createObjectFile(Path);
>>>   if (Obj) {
>>>     DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
>>>     uint32_t NumCUs = DwarfContext.getNumCompileUnits();
>>>     for (uint32_t i=0; i<NumCUs; ++i) {
>>>       DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
>>>       if (U)
>>>         U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
>>>     }
>>>   }
>>> But things fall down if I try to uncomment the DW_FORM_ref_addr line
>> above. The problem is that AsmPrinter really expects a full stack of stuff
>> to be there and expects people to use the DwarfDebug class and all of its
>> associated classes. These associated classes really want to use the "DI"
>> objects (DICompileUnit, etc) so to create a compile unit we would need to
>> create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit.
>> That stack is pretty heavy and requires the code shown above to create
>> many many classes just to represent the simple output we wish to emit.
>> Another downside of the AsmPrinter method is we don't know which targets
>> people are going to build into their binaries and thus we don't know which
>> triples we will be able to use when generating DWARF info. Adrian Prantl
>> attempted to help me get things working over here and we kept running into
>> roadblocks.
>>> It'd be great to have more detail about the roadblocks you hit to better
>> understand how bad/what the issues are.
>> A few blocks:
>> - DIEString doesn't support DW_FORM_string. DW_FORM_string support might
>> have been pulled so that we never emit it from clang, but we would want to
>> have a unit test that covers being able to read an inlined C string from a
>> DIE. Support won't be that hard to add, but we might not want it so that
>> people can't use it by accident and make less efficient DWARF.
> Seems to me we originally supported only DW_FORM_string, and then at some
> point it was tossed in favor of DW_FORM_strp in order to get space savings
> from string pooling.  In fact using DW_FORM_string for small strings would
> save some more space (admittedly not much) and a bunch of relocations.
> (I found data from an old experiment, in a debug build of Clang it saved 
> ~0.7MB out of a total 340MB of debug-info size, and >360K ELF relocations.)

This is true, but it also adversely affects DWARF parsing speed as you will need to manually skip each C string when parsing the DIEs.

> I'd favor an API that passed the string down and let the DIE generator
> (as opposed to the DWARF generator) pick the form.

I have currently added a DIEInlinedString class that can be used for DW_FORM_string attributes.

>> - Asserts, asserts, asserts. As we tried to emit DWARF, we got an asserts
>> in bool AsmPrinter::doInitialization(Module &M). On the first line:
>>  MMI = getAnalysisIfAvailable<MachineModuleInfo>();
>> This asserts if you use the AsmPrinter the way the DwarfLinker and the
>> AsmPrinter based DwarfGen does if you call this. You must call this to
>> generate the DebugDwarf. If you get past this by installing a Pass then we
>> assert at:
>>  GCModuleInfo *MI = getAnalysisIfAvailable<GCModuleInfo>();
>>  assert(MI && "AsmPrinter didn't require GCModuleInfo?");
>> If we don't have this, we don't get a DwarfDebug.
>>> Even if we end up adding another set of code to generate DWARF (which
>> I'd really like to avoid) we'd want to, at some point, coalesce them back
>> together. Given the goal is to try to coalesce the DWARF parsing code in
>> LLDB and LLVM, it'd seem unfortunate if that effort just created another
>> similar (or larger) amount of work for DWARF generation.
>> This DWARF generator could just live in the unittests/DebugInfo/DWARF
>> directory so it wouldn't pollute anything in LLVM it we do choose to use
>> it.
>>> I wanted to pass this patch along in case someone wants to take a look
>> at how we can possibly fix the lib/CodeGen/DwarfGenerator.cpp and
>> lib/CodeGen/DwarfGenerator.h. The code that sets up all the required
>> classes for the AsmPrinter method is in the DwarfGen class from
>> lib/CodeGen/DwarfGenerator.cpp in the following function:
>>> bool DwarfGen::init(Triple TheTriple, StringRef OutputFilename);
>>> The code in this function was looted from existing DwarfLinker.cpp code.
>> This functions requires a valid triple and that triple is used to create a
>> lot of the classes required to make the AsmPrinter. I am not sure if any
>> other code uses the AsmPrinter like this besides the DwarfLinker.cpp code
>> and that code uses its own magic to actually link the DWARF. It does reuse
>> some of the functions as I did, but the DwarfLinker doesn't use any of the
>> DwarfDebug, DwarfCompileUnit or any of the classes that the
>> compiler/assembler uses when making DWARF.
>>> What's the DwarfLinker code missing that you need? If that code is
>> generating essentially arbitrary DWARF, what's blocking using the same
>> technique for generating DWARF for parsing tests?
>> They don't use any of the DwarfDebug, DwarfCompileUnit classes. They also
>> don't use any of the DI classes when making up the debug info. So both the
>> DWARF linker and the generator have similar needs: make DWARF that isn't
>> tied too closely to the clang internal classes and DI classes.
>>> The amount of work required for refactoring the AsmPrinter exceeds the
>> time I am going to have, but I would still like to have DWARF API testing
>> in the unit tests.
>>> So my question is if anyone would have objections to using the stand
>> along DWARF generator in unittests/DebugInfo/DWARF until we can later get
>> the YAML tools to be able to produce DWARF and we can switch to testing
>> the DWARF data that way? Chris Bieneman has expressed interest in getting
>> a DWARF/YAML layer going.
>>> Those tools would still want to use pretty similar (conceptually)
>> abstractions to LLVM's codegen and llvm-dsymutil. I'd still strongly
>> prefer to generalize/keep common APIs here - or better understand why it's
>> not practical now (& what it will take/how we make sure we have a plan and
>> resources to get there eventually).
>>> My reasoning is:
>>> - I want to be able to test DWARF APIs we have to ensure they work
>> correctly as there are no Dwarf API tests right now. I will be adding code
>> that changes many things in the DWARF parser and it will be essential to
>> verify that there are no regressions in the DWARF APIs.
>>> - Not sure which targets would be built into LLVM so it might be hard to
>> write tests that cover 32/64 bit addresses and all the variants if we have
>> to do things legally via AsmPrinter and valid targets
>>> Seems like it might be plausible to refactor out whatever features of
>> the AsmPrinter these APIs require (so we just harvest that data out of
>> AsmPrinter and pass it down in a struct, say - so that other users can
>> pass their own struct without needing an AsmPrinter). Though, again,
>> interested to know how dsymutil is working in these situations.
>> I can try that method if indeed the only places that use the DwarfDebug
>> are the DW_FORM_ref_addr and location lists. I'll let you know how that
>> goes.
>>> - Not enough time to modify AsmPrinter to not require the full DebugInfo
>> stack and the classes that it uses (llvm::DwarfCompileUnit which must use
>> llvm::DICompileUnit, llvm::DIE class which uses many local classes that
>> all depend on  the full DwarfDebug stack).
>>> Will you have time at some later date to come back and revisit this?
>> It's understandable that we may choose to incur short term technical debt
>> with an understanding that it will be paid off in some timely manner. It'd
>> be less desirable if there's no such plan/possibility and we incur a
>> fairly clear case of technical debt (redundant DWARF generation libraries
>> - especially when this effort is to remove a redundant DWARF parser).
>> Not sure anyone else will need to generate DWARF manually. The two clients
>> currently are the DWARF unittests and the DwarfLinker. The DwarfLinker
>> worked around these issues. If the AsmPrinter wasn't such an integral part
>> of the entire compiler stack, I could take a stab at refactoring it, but I
>> don't believe I am the right person to do this at this point as I have no
>> experience or knowledge of the various ways that this class is used, or
>> how it interacts with other support classes (DwarfDebug, and many many
>> other classes).
>> Things that still worry me:
>> - not being able to generate DWARF for 32/64 if targets are missing
> You mean DWARF-32 and DWARF-64 formats?  LLVM doesn't do DWARF-64.
> If you mean 64-bit target-machine addresses, I guess I don't understand
> the problem.  If you have target-dependent tests, then they only work
> when the right targets are there.  This is extremely common and I'm 
> not clear why it would be a problem for the DWARF tests.

I wasn't aware that there were target-dependent tests. Do you know of one in the unittest directory you can point me to? I did mean 32 bit address target, versus 64 bit address targets. I am not sure how I can test 4 and 8 byte addresses reliably. What triple to I use in the unittest? I can't assume x86_64 as we may have been built on a 32 bit ARM system with only the 32 bit ARM targets. 
>> - DIEString not supporting DW_FORM_string. I can add support, but I don't
>> know if we want it as if we add it people might start using it.
> See above. If the API picked the form this would not be a concern.

For DWARF parsing speed I still like the DW_FORM_strp.

>> - hacking around asserts by constructing classes and copying code from
>> places that properly use the AsmPrinter that way it is supposed to be used
>> so that we can use it in a way that it wasn't designed to be used.
>>> I made a large effort to try and get things working with the AsmPrinter,
>> so I wanted everyone to know that I tried to get that solution working.
>> Let me know what you anyone thinks.
>>> Greg Clayton
> --paulr
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

More information about the llvm-dev mailing list