[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Tue Sep 1 10:07:28 PDT 2020


Fair enough - thanks for clarifying the differences! (I'd still lean a bit
towards this being dwz-esque, as you say "an extension of classic dwz"
using a bit more domain knowledge (of terminators and C++ odr - though I'm
not sure dsymutil does rely on the ODR, does it? It relies on it to know
that two names represent the same type, I suppose, but doesn't assume
they're already identical, instead it merges their members))

But I don't have super strong feelings about the naming.

On Tue, Sep 1, 2020 at 6:36 AM Alexey <avl.lapshin at gmail.com> wrote:

>
> On 01.09.2020 06:27, David Blaikie wrote:
>
> A quick note: The feature as currently proposed sounds like it's an exact
> match for 'dwz'? Is there any benefit to this over the existing dwz
> project? Is it different in some ways I'm not aware of? (I haven't actually
> used dwz, so I might have some mistaken ideas about how it should work)
>
> If it's going to solve the same general problem, but be in the llvm
> project instead, then maybe it should be called llvm-dwz.
>
> It looks like dwz and llvm-dwarfutil are not exactly matched in
> functionality.
>
> dwz is a  program that attempts to optimize DWARF debugging information
> contained in ELF shared libraries and ELF executables for *size*.
>
> llvm-dwarfutil is a tool that is used for processing debug
> info(DWARF) located in built binary files to improve debug info *quality*,
> reduce debug info *size* and accelerate debug info *processing*.
>
> Things which are supposed to be done by llvm-dwarfutil and which are not
> done by dwz: removing obsolete debug info, building indexes, stripping
> unneeded debug sections, compress/decompress debug sections.
>
> Common thing is that both of these tools do debug info size reduction.
> But they do this using different approaches:
>
> 1. dwz reduces the size of debug info by creating partial compilation
> units
>     for duplicated parts. So that these partial compilation units could be
> imported
>     in every duplicated place. AFAIU, That optimization gives the most
> size saving effect.
>
>    another size saving optimization is ODR types deduplication.
>
> 2. llvm-dwarfutil reduces the size of debug info by ODR types
> deduplication
>    which gives the most size saving effect in llvm-dwarfutil case.
>
>    another size saving optimization is removing obsolete debug info.
>    (which actually is not only about size but about correctness also)
>
> So, it looks like these tools are not equal. If we would consider that
> llvm-dwz is an extension of classic dwz then we could probably
> name it as llvm-dwz.
>
>
> Though I understand the desire for this to grow other functionality, like
> DWARF-aware dwp-ing. Might be better for this to busybox and provide that
> functionality under llvm-dwp instead, or more likely I Suspect, that the
> existing llvm-dwp will be rewritten (probably by me) to use more of lld's
> infrastructure to be more efficient (it's current object reading/writing
> logic is using LLVM's libObject and MCStreamer, which is a bit inefficient
> for a very content-unaware linking process) and then maybe that could be
> taught to use DwarfLinker as a library to optionally do DWARF-aware linking
> depending on the users time/space tradeoff desires. Still benefiting from
> any improvements to the underlying DwarfLinker library (at which point that
> would be shared between llvm-dsymutil, llvm-dwz, and llvm-dwp).
>
> On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com> wrote:
>
>> Hi,
>>
>>    We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>>    Any thoughts on this?
>>    Thanks in advance, Alexey.
>>
>> ======================================================================
>>
>> llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>> info(DWARF)
>> located in built binary files to improve debug info quality,
>> reduce debug info size and accelerate debug info processing.
>> Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>> WASM(Apndx C).
>>
>> ======================================================================
>>
>> Specifically, the tool would do:
>>
>>    - Remove obsolete debug info which refers to code deleted by the linker
>>      doing the garbage collection (gc-sections).
>>
>>    - Deduplicate debug type definitions for reducing resulting size of
>> binary.
>>
>>    - Build accelerator/index tables.
>>      = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>> .debug_pubtypes.
>>
>>    - Strip unneeded tables.
>>      = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>> .debug_pubtypes.
>>
>>    - Compress or decompress debug info as requested.
>>
>> Possible feature:
>>
>>    - Join split dwarf .dwo files in a single file containing all debug
>> info
>>      (convert split DWARF into monolithic DWARF).
>>
>> ======================================================================
>>
>> User interface:
>>
>>    OVERVIEW: A tool for optimizing debug info located in the built binary.
>>
>>    USAGE: llvm-dwarfutil [options] input output
>>
>>    OPTIONS: (Apndx E)
>>
>> ======================================================================
>>
>> Implementation notes:
>>
>> 1. Removing obsolete debug info would be done using DWARFLinker llvm
>> library.
>>
>> 2. Data types deduplication would be done using DWARFLinker llvm library.
>>
>> 3. Accelerator/index tables would be generated using DWARFLinker llvm
>> library.
>>
>> 4. Interface of DWARFLinker library would be changed in such way that it
>>     would be possible to switch on/off various stages:
>>
>>    class DWARFLinker {
>>      setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>>
>>      setDoAppleNames ( bool DoAppleNames = false );
>>      setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>>      setDoAppleTypes ( bool DoAppleTypes = false );
>>      setDoObjC ( bool DoObjC = false );
>>      setDoDebugPubNames ( bool DoDebugPubNames = false );
>>      setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>
>>      setDoDebugNames (bool DoDebugNames = false);
>>      setDoGDBIndex (bool DoGDBIndex = false);
>>    }
>>
>> 5. Copying source file contents, stripping tables,
>> compressing/decompressing tables
>>     would be done by ObjCopy llvm library(extracted from llvm-objcopy):
>>
>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                               object::COFFObjectFile &In, Buffer &Out);
>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                               object::ELFObjectFileBase &In, Buffer &Out);
>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                               object::MachOObjectFile &In, Buffer &Out);
>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                               object::WasmObjectFile &In, Buffer &Out);
>>
>> 6. Address ranges and single addresses pointing to removed code should
>> be marked
>>     with tombstone value in the input file:
>>
>>     -2 for .debug_ranges and .debug_loc.
>>     -1 for other .debug* tables.
>>
>> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>>
>> ======================================================================
>>
>> Roadmap:
>>
>> 1. Refactor llvm-objcopy to extract it`s implementation into separate
>> library
>>     ObjCopy(in LLVM tree).
>>
>> 2. Create a command line utility using existed DWARFLinker and ObjCopy
>>     implementation. First version is supposed to work with only ELF
>> input object files.
>>     It would take input ELF file with unoptimized debug info and create
>> output
>>     ELF file with optimized debug info. That version would be done out
>> of the llvm tree.
>>
>> 3. Make a tool to be able to work in multi-thread mode.
>>
>> 4. Consider it to be included into LLVM tree.
>>
>> 5. Support DWARF5 tables.
>>
>> ======================================================================
>>
>> Appendix A. Should this tool be implemented as a new tool or as an
>> extension
>>              to dsymutil/llvm-objcopy?
>>
>>     There already exists a tool which removes obsolete debug info on
>> darwin - dsymutil.
>>     Why create another tool instead of extending the already existed
>> dsymutil/llvm-objcopy?
>>
>>     The main functionality of dsymutil is located in a separate library
>> - DWARFLinker.
>>     Thus, dsymutil utility is a command-line interface for DWARFLinker.
>> dsymutil has
>>     another type of input/output data: it takes several object files and
>> address map
>>     as input and creates a .dSYM bundle with linked debug info as
>> output. llvm-dwarfutil
>>     would take a built executable as input and create an optimized
>> executable as output.
>>     Additionally, there would be many command-line options specific for
>> only one utility.
>>     This means that these utilities(implementing command line interface)
>> would significantly
>>     differ. It makes sense not to put another command-line utility
>> inside existing dsymutil,
>>     but make it as a separate utility. That is the reason why
>> llvm-dwarfutil suggested to be
>>     implemented not as sub-part of dsymutil but as a separate tool.
>>
>>     Please share your preference: whether llvm-dwarfutil should be
>>     separate utility, or a variant of dsymutil compiled for ELF?
>>
>> ======================================================================
>>
>> Appendix B. The machO object file format is already supported by dsymutil.
>>     Depending on the decision whether llvm-dwarfutil would be done as a
>> subproject
>>     of dsymutil or as a separate utility - machO would be supported or
>> not.
>>
>> ======================================================================
>>
>> Appendix C. Support for the COFF and WASM object file formats presented as
>>      possible future improvement. It would be quite easy to add them
>> assuming
>>      that llvm-objcopy already supports these formats. It also would
>> require
>>      supporting DWARF6-suggested tombstone values(-1/-2).
>>
>> ======================================================================
>>
>> Appendix D. Documentation.
>>
>>    - proposal for DWARF6 which suggested -1/-2 values for marking bad
>> addresses
>>      http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>    - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>>    - proposal "Remove obsolete debug info in lld."
>> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>
>> ======================================================================
>>
>> Appendix E. Possible command line options:
>>
>> DwarfUtil Options:
>>
>>    --build-aranges           - generate .debug_aranges table.
>>    --build-debug-names       - generate .debug_names table.
>>    --build-debug-pubnames    - generate .debug_pubnames table.
>>    --build-debug-pubtypes    - generate .debug_pubtypes table.
>>    --build-gdb-index         - generate .gdb_index table.
>>    --compress                - Compress debug tables.
>>    --decompress              - Decompress debug tables.
>>    --deduplicate-types       - Do ODR deduplication for debug types.
>>    --garbage-collect         - Do garbage collecting for debug info.
>>    --num-threads=<n>         - Specify the maximum number (n) of
>> simultaneous threads
>>                                to use when optimizing input file.
>>                                Defaults to the number of cores on the
>> current machine.
>>    --strip-all               - Strip all debug tables.
>>    --strip=<name1,name2>     - Strip specified debug info tables.
>>    --strip-unoptimized-debug - Strip all unoptimized debug tables.
>>    --tombstone=<value>       - Tombstone value used as a marker of
>> invalid address.
>>      =bfd                    -   BFD default value
>>      =dwarf6                 -   Dwarf v6.
>>    --verbose                 - Enable verbose logging and encoding
>> details.
>>
>> Generic Options:
>>
>>    --help                    - Display available options (--help-hidden
>> for more)
>>    --version                 - Display the version of this program
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/cae076be/attachment.html>


More information about the llvm-dev mailing list