[PATCH] D26469: Improve DWARF parsing and attribute retrieval speed by improving DWARF abbreviation declarations.

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 9 16:06:00 PST 2016


On Wed, Nov 9, 2016 at 1:39 PM Greg Clayton <clayborg at gmail.com> wrote:

> clayborg created this revision.
> clayborg added reviewers: aprantl, dblaikie, echristo, llvm-commits.
> clayborg set the repository for this revision to rL LLVM.
> Herald added subscribers: modocache, mgorny, mehdi_amini.
>
> If you take a look at large DWARF files you will notice there are not that
> many abbreviation declarations. For a large binary that has debug
> inforamtion for LLVM, Clang and LLDB, there are only ~500 abbreviations. By
> adding some new data members to the DWARFAbbreviationDeclaration class, we
> can parse DWARF faster and also retrieve DWARF attribute values faster.
>
> To speed up the DWARF parsing we do a little extra work when parsing the
> DWARFAbbreviationDeclaration. We determine if a
> DWARFAbbreviationDeclaration has a fixed byte size and we remember how to
> calculate the fixed byte size using a new
> DWARFAbbreviationDeclaration::FixedSizeInfo structure that is stored as a
> optional value inside the DWARFAbbreviationDeclaration. When parsing DWARF
> the DIEs in a compile unit, we must extract DWARFDebugInfoEntryMinimal
> objects. In the extract function we check this
> DWARFAbbreviationDeclaration::FixedAttributeSize optional value to see if
> the byte size if fixed and if so, we can skip all attributes in one
> operation instead of iterating through all of the attribute/form pairs and
> individually skipping each one.
>
> To speed up attribute value extraction, we get rid of the
> DWARFFormValue::getFixedFormSizes(...) static function and store an
> optional byte size with each attribute/form pair. The old
> DWARFAbbreviationDeclaration::AttributeSpec was:
>
>   class DWARFAbbreviationDeclaration {
>     struct AttributeSpec {
>       dwarf::Attribute Attr;
>       dwarf::Form Form;
>     };
>   };
>
> The new one add an optional ByteSize:
>
>   class DWARFAbbreviationDeclaration {
>   public:
>     struct AttributeSpec {
>       dwarf::Attribute Attr;
>       dwarf::Form Form;
>       Optional<uint8_t> ByteSize;
>   };
>
> This allows us to not have to calculate fixed form sizes each time we
> parse a DIE. Member fucntions were added to DWARFAbbreviationDeclaration,
> DWARFAbbreviationDeclaration::AttributeSpec and DWARFFormValue to
> centralize the information for each AttributeSpec and to be able to
> calculate the byte size given a DWARFUnit for a
> DWARFAbbreviationDeclaration as a whole, and if that fails, each
> AttributeSpec individually. We also added a map to convert dwarf::Attribute
> enum values into attribute indexes.
>
> These fixes improve DWARF parsing speed by around 7 percent. The test was
> done by parsing an LLDB build that contains full debug info for LLDB, Clang
> and LLVM where we grab all compile units, extract all DIEs, traverses each
> DIE in the hierachy and asking each one for its name by extracting the
> DW_AT_name attribute (if any) and extracting the DW_AT_low_pc attribute.
>
> Previously there we no DWARF unittests that actually tested DWARF parsing.
> I have added a dwarf_gen::DWARFGenerator class that allows C++ code to
> easily create DWARF debug info and encode it into. Example code for
> generating DWARF:
>
>   using namespace dwarf_gen;
>   // Create a DWARF generator object
>   DWARFGenerator Dwarf;
>   // Create a compile unit with the specified DWARF version and address
> size
>   CompileUnit &CU = Dwarf.appendCompileUnit(Version, AddrSize);
>
>   // Append a few attributes to the compile unit's DIE:
>   CU.Die.appendAttribute({DW_AT_name, DW_FORM_strp, "/tmp/main.c"});
>   CU.Die.appendAttribute({DW_AT_language, DW_FORM_data2, DW_LANG_C});
>
>   // Create a DW_TAG_subprogram DIE as a child of the compile unit DIE and
>   // add some attributes to it
>   DIE &SubprogramDie = CU.Die.appendChild(DW_TAG_subprogram);
>   SubprogramDie.appendAttribute({DW_AT_name, DW_FORM_strp, "main"});
>   SubprogramDie.appendAttribute({DW_AT_low_pc, DW_FORM_addr, 0x1000U});
>   SubprogramDie.appendAttribute({DW_AT_high_pc, DW_FORM_addr, 0x2000U});
>
>   // Create a DW_TAG_base_type type DIE as a child of the compile unit DIE
> and
>   // add some attributes to it
>   DIE &IntDie = CU.Die.appendChild(DW_TAG_base_type);
>   IntDie.appendAttribute({DW_AT_name, DW_FORM_strp, "int"});
>   IntDie.appendAttribute({DW_AT_encoding, DW_FORM_data1, DW_ATE_signed});
>   IntDie.appendAttribute({DW_AT_byte_size, DW_FORM_data1, 4});
>
>   // Create a DW_TAG_base_type type DIE as a child of the subprogram DIE
> and
>   // add some attributes to it.
>   DIE &ArgcDie = SubprogramDie.appendChild(DW_TAG_formal_parameter);
>   ArgcDie.appendAttribute({DW_AT_name, DW_FORM_strp, "argc"});
>   ArgcDie.appendAttribute({DW_AT_type, DW_FORM_ref4, &IntDie});
>
>   // Generate the DWARF
>   DWARFSections DwarfSections;
>   Dwarf.generate(DwarfSections);
>
>   // Now make a DWARFContextInMemory using the given section data that was
>   // generated and use LLVM's DWARF API to extract info from it.
>   DWARFContextInMemory dwarfContext(
>       LittleEndian, AddrSize, DwarfSections.getDebugAbbrevData(),
>       DwarfSections.getDebugInfoData(), DwarfSections.getDebugStrData());
>   uint32_t NumCUs = dwarfContext.getNumCompileUnits();
>   DWARFCompileUnit *U = dwarfContext.getCompileUnitAtIndex(0);
>   DWARFDebugInfoEntryMinimal* Die = U->getUnitDIE(false);
>
> The DWARF generator is a separate code base from the parser and that
> ensures that we don't end up with symmetric encode/decode errors.
>

I'd rather use the same codebase/reduce duplication and test the
encoding/decoding from known files/byte dumps if needed.

Also - it'd be good if the DWARF generation code was the same as the code
LLVM uses to generate DWARF, rather than having two generators - unless
there's a particularly compelling benefit/difference in use cases.

(& could be unit tested separately - but the DWARF generation code already
in LLVM's pretty well tested in the LLVM tests)


>
> A full suite of unit tests were added that test decoding all DW_FORM_XXX
> values that we currently support using DWARF version 2, 3, and 4. Tests we
> also added for parsing a known chunk of DWARF and ensuring that we can
> extract it, and get the children and sibling DIEs as expected.
>
>
> Repository:
>   rL LLVM
>
> https://reviews.llvm.org/D26469
>
> Files:
>   include/llvm/DebugInfo/DWARF/DWARFAbbreviationDeclaration.h
>   include/llvm/DebugInfo/DWARF/DWARFContext.h
>   include/llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h
>   include/llvm/DebugInfo/DWARF/DWARFFormValue.h
>   lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
>   lib/DebugInfo/DWARF/DWARFContext.cpp
>   lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
>   lib/DebugInfo/DWARF/DWARFFormValue.cpp
>   lib/DebugInfo/DWARF/DWARFUnit.cpp
>   unittests/DebugInfo/DWARF/CMakeLists.txt
>   unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
>   unittests/DebugInfo/DWARF/DWARFFormValueTest.cpp
>   unittests/DebugInfo/DWARF/DWARFGenerator.cpp
>   unittests/DebugInfo/DWARF/DWARFGenerator.h
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161110/97cc5c46/attachment.html>


More information about the llvm-commits mailing list