[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Alexey via llvm-dev
llvm-dev at lists.llvm.org
Wed Sep 2 09:56:52 PDT 2020
On 01.09.2020 20:07, David Blaikie wrote:
> Fair enough - thanks for clarifying the differences! (I'd still lean a
> bit towards this being dwz-esque, as you say "an extension of classic dwz"
I doubt a little about "llvm-dwz" since it might confuse people who
would expect exactly the same behavior.
But if we think of it as "an extension of classic dwz" and the possible
confusion is not a big deal then
I would be fine with "llvm-dwz".
> using a bit more domain knowledge (of terminators and C++ odr - though
> I'm not sure dsymutil does rely on the ODR, does it? It relies on it
> to know that two names represent the same type, I suppose, but doesn't
> assume they're already identical, instead it merges their members))
if dsymutil is able to find a full definition then it would remove all
other definitions(which matched by name) and set all references to that
found definition. If it is not able to find a full definition then it
would do nothing. i.e. if there are two incomplete
definitions(DW_AT_declaration (true)) with the same name then they
would not be merged. That is a possible improvement - to teach dsymutil
to merge incomplete types.
Alexey.
>
> But I don't have super strong feelings about the naming.
>
> On Tue, Sep 1, 2020 at 6:36 AM Alexey <avl.lapshin at gmail.com
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
> On 01.09.2020 06:27, David Blaikie wrote:
>> A quick note: The feature as currently proposed sounds like it's
>> an exact match for 'dwz'? Is there any benefit to this over the
>> existing dwz project? Is it different in some ways I'm not aware
>> of? (I haven't actually used dwz, so I might have some mistaken
>> ideas about how it should work)
>>
>> If it's going to solve the same general problem, but be in the
>> llvm project instead, then maybe it should be called llvm-dwz.
> It looks like dwz and llvm-dwarfutil are not exactly matched in
> functionality.
>
> dwz is a program that attempts to optimize DWARF debugging
> information
> contained in ELF shared libraries and ELF executables for *size*.
>
> llvm-dwarfutil is a tool that is used for processing debug
> info(DWARF) located in built binary files to improve debug info
> *quality*,
> reduce debug info *size* and accelerate debug info *processing*.
>
> Things which are supposed to be done by llvm-dwarfutil and which
> are not
> done by dwz: removing obsolete debug info, building indexes,
> stripping
> unneeded debug sections, compress/decompress debug sections.
>
> Common thing is that both of these tools do debug info size
> reduction.
> But they do this using different approaches:
>
> 1. dwz reduces the size of debug info by creating partial
> compilation units
> for duplicated parts. So that these partial compilation units
> could be imported
> in every duplicated place. AFAIU, That optimization gives the
> most size saving effect.
>
> another size saving optimization is ODR types deduplication.
>
> 2. llvm-dwarfutil reduces the size of debug info by ODR types
> deduplication
> which gives the most size saving effect in llvm-dwarfutil case.
>
> another size saving optimization is removing obsolete debug info.
> (which actually is not only about size but about correctness also)
>
> So, it looks like these tools are not equal. If we would consider
> that
> llvm-dwz is an extension of classic dwz then we could probably
> name it as llvm-dwz.
>
>>
>> Though I understand the desire for this to grow other
>> functionality, like DWARF-aware dwp-ing. Might be better for this
>> to busybox and provide that functionality under llvm-dwp instead,
>> or more likely I Suspect, that the existing llvm-dwp will be
>> rewritten (probably by me) to use more of lld's infrastructure to
>> be more efficient (it's current object reading/writing logic is
>> using LLVM's libObject and MCStreamer, which is a bit inefficient
>> for a very content-unaware linking process) and then maybe that
>> could be taught to use DwarfLinker as a library to optionally do
>> DWARF-aware linking depending on the users time/space tradeoff
>> desires. Still benefiting from any improvements to the underlying
>> DwarfLinker library (at which point that would be shared between
>> llvm-dsymutil, llvm-dwz, and llvm-dwp).
>>
>> On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com
>> <mailto:avl.lapshin at gmail.com>> wrote:
>>
>> Hi,
>>
>> We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>> Any thoughts on this?
>> Thanks in advance, Alexey.
>>
>> ======================================================================
>>
>> llvm-dwarfutil(Apndx A) - is a tool that is used for
>> processing debug
>> info(DWARF)
>> located in built binary files to improve debug info quality,
>> reduce debug info size and accelerate debug info processing.
>> Supported object files formats: ELF, MachO(Apndx B),
>> COFF(Apndx C),
>> WASM(Apndx C).
>>
>> ======================================================================
>>
>> Specifically, the tool would do:
>>
>> - Remove obsolete debug info which refers to code deleted
>> by the linker
>> doing the garbage collection (gc-sections).
>>
>> - Deduplicate debug type definitions for reducing
>> resulting size of
>> binary.
>>
>> - Build accelerator/index tables.
>> = .debug_aranges, .debug_names, .gdb_index,
>> .debug_pubnames,
>> .debug_pubtypes.
>>
>> - Strip unneeded tables.
>> = .debug_aranges, .debug_names, .gdb_index,
>> .debug_pubnames,
>> .debug_pubtypes.
>>
>> - Compress or decompress debug info as requested.
>>
>> Possible feature:
>>
>> - Join split dwarf .dwo files in a single file containing
>> all debug info
>> (convert split DWARF into monolithic DWARF).
>>
>> ======================================================================
>>
>> User interface:
>>
>> OVERVIEW: A tool for optimizing debug info located in the
>> built binary.
>>
>> USAGE: llvm-dwarfutil [options] input output
>>
>> OPTIONS: (Apndx E)
>>
>> ======================================================================
>>
>> Implementation notes:
>>
>> 1. Removing obsolete debug info would be done using
>> DWARFLinker llvm
>> library.
>>
>> 2. Data types deduplication would be done using DWARFLinker
>> llvm library.
>>
>> 3. Accelerator/index tables would be generated using
>> DWARFLinker llvm
>> library.
>>
>> 4. Interface of DWARFLinker library would be changed in such
>> way that it
>> would be possible to switch on/off various stages:
>>
>> class DWARFLinker {
>> setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo =
>> false);
>>
>> setDoAppleNames ( bool DoAppleNames = false );
>> setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>> setDoAppleTypes ( bool DoAppleTypes = false );
>> setDoObjC ( bool DoObjC = false );
>> setDoDebugPubNames ( bool DoDebugPubNames = false );
>> setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>
>> setDoDebugNames (bool DoDebugNames = false);
>> setDoGDBIndex (bool DoGDBIndex = false);
>> }
>>
>> 5. Copying source file contents, stripping tables,
>> compressing/decompressing tables
>> would be done by ObjCopy llvm library(extracted from
>> llvm-objcopy):
>>
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::COFFObjectFile &In,
>> Buffer &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::ELFObjectFileBase &In, Buffer &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::MachOObjectFile &In,
>> Buffer &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::WasmObjectFile &In,
>> Buffer &Out);
>>
>> 6. Address ranges and single addresses pointing to removed
>> code should
>> be marked
>> with tombstone value in the input file:
>>
>> -2 for .debug_ranges and .debug_loc.
>> -1 for other .debug* tables.
>>
>> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>>
>> ======================================================================
>>
>> Roadmap:
>>
>> 1. Refactor llvm-objcopy to extract it`s implementation into
>> separate
>> library
>> ObjCopy(in LLVM tree).
>>
>> 2. Create a command line utility using existed DWARFLinker
>> and ObjCopy
>> implementation. First version is supposed to work with
>> only ELF
>> input object files.
>> It would take input ELF file with unoptimized debug info
>> and create
>> output
>> ELF file with optimized debug info. That version would be
>> done out
>> of the llvm tree.
>>
>> 3. Make a tool to be able to work in multi-thread mode.
>>
>> 4. Consider it to be included into LLVM tree.
>>
>> 5. Support DWARF5 tables.
>>
>> ======================================================================
>>
>> Appendix A. Should this tool be implemented as a new tool or
>> as an extension
>> to dsymutil/llvm-objcopy?
>>
>> There already exists a tool which removes obsolete debug
>> info on
>> darwin - dsymutil.
>> Why create another tool instead of extending the already
>> existed
>> dsymutil/llvm-objcopy?
>>
>> The main functionality of dsymutil is located in a
>> separate library
>> - DWARFLinker.
>> Thus, dsymutil utility is a command-line interface for
>> DWARFLinker.
>> dsymutil has
>> another type of input/output data: it takes several
>> object files and
>> address map
>> as input and creates a .dSYM bundle with linked debug
>> info as
>> output. llvm-dwarfutil
>> would take a built executable as input and create an
>> optimized
>> executable as output.
>> Additionally, there would be many command-line options
>> specific for
>> only one utility.
>> This means that these utilities(implementing command line
>> interface)
>> would significantly
>> differ. It makes sense not to put another command-line
>> utility
>> inside existing dsymutil,
>> but make it as a separate utility. That is the reason why
>> llvm-dwarfutil suggested to be
>> implemented not as sub-part of dsymutil but as a separate
>> tool.
>>
>> Please share your preference: whether llvm-dwarfutil
>> should be
>> separate utility, or a variant of dsymutil compiled for ELF?
>>
>> ======================================================================
>>
>> Appendix B. The machO object file format is already supported
>> by dsymutil.
>> Depending on the decision whether llvm-dwarfutil would be
>> done as a
>> subproject
>> of dsymutil or as a separate utility - machO would be
>> supported or not.
>>
>> ======================================================================
>>
>> Appendix C. Support for the COFF and WASM object file formats
>> presented as
>> possible future improvement. It would be quite easy to
>> add them
>> assuming
>> that llvm-objcopy already supports these formats. It
>> also would require
>> supporting DWARF6-suggested tombstone values(-1/-2).
>>
>> ======================================================================
>>
>> Appendix D. Documentation.
>>
>> - proposal for DWARF6 which suggested -1/-2 values for
>> marking bad
>> addresses
>> http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>> - dsymutil tool
>> https://llvm.org/docs/CommandGuide/dsymutil.html.
>> - proposal "Remove obsolete debug info in lld."
>> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>
>> ======================================================================
>>
>> Appendix E. Possible command line options:
>>
>> DwarfUtil Options:
>>
>> --build-aranges - generate .debug_aranges table.
>> --build-debug-names - generate .debug_names table.
>> --build-debug-pubnames - generate .debug_pubnames table.
>> --build-debug-pubtypes - generate .debug_pubtypes table.
>> --build-gdb-index - generate .gdb_index table.
>> --compress - Compress debug tables.
>> --decompress - Decompress debug tables.
>> --deduplicate-types - Do ODR deduplication for debug
>> types.
>> --garbage-collect - Do garbage collecting for
>> debug info.
>> --num-threads=<n> - Specify the maximum number (n) of
>> simultaneous threads
>> to use when optimizing input file.
>> Defaults to the number of
>> cores on the
>> current machine.
>> --strip-all - Strip all debug tables.
>> --strip=<name1,name2> - Strip specified debug info tables.
>> --strip-unoptimized-debug - Strip all unoptimized debug
>> tables.
>> --tombstone=<value> - Tombstone value used as a
>> marker of
>> invalid address.
>> =bfd - BFD default value
>> =dwarf6 - Dwarf v6.
>> --verbose - Enable verbose logging and
>> encoding details.
>>
>> Generic Options:
>>
>> --help - Display available options
>> (--help-hidden
>> for more)
>> --version - Display the version of this
>> program
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200902/befcfa4f/attachment.html>
More information about the llvm-dev
mailing list