[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

Alexey via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 2 09:56:52 PDT 2020


On 01.09.2020 20:07, David Blaikie wrote:
> Fair enough - thanks for clarifying the differences! (I'd still lean a 
> bit towards this being dwz-esque, as you say "an extension of classic dwz"
I doubt a little about "llvm-dwz" since it might confuse people who 
would expect exactly the same behavior.
But if we think of it as "an extension of classic dwz" and the possible 
confusion is not a big deal then
I would be fine with "llvm-dwz".
> using a bit more domain knowledge (of terminators and C++ odr - though 
> I'm not sure dsymutil does rely on the ODR, does it? It relies on it 
> to know that two names represent the same type, I suppose, but doesn't 
> assume they're already identical, instead it merges their members))

if dsymutil is able to find a full definition then it would remove all 
other definitions(which matched by name) and set all references to that 
found definition. If it is not able to find a full definition then it 
would do nothing. i.e. if there are two incomplete 
definitions(DW_AT_declaration   (true)) with the same name then they 
would not be merged. That is a possible improvement - to teach dsymutil 
to merge incomplete types.

Alexey.

>
> But I don't have super strong feelings about the naming.
>
> On Tue, Sep 1, 2020 at 6:36 AM Alexey <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
>     On 01.09.2020 06:27, David Blaikie wrote:
>>     A quick note: The feature as currently proposed sounds like it's
>>     an exact match for 'dwz'? Is there any benefit to this over the
>>     existing dwz project? Is it different in some ways I'm not aware
>>     of? (I haven't actually used dwz, so I might have some mistaken
>>     ideas about how it should work)
>>
>>     If it's going to solve the same general problem, but be in the
>>     llvm project instead, then maybe it should be called llvm-dwz.
>     It looks like dwz and llvm-dwarfutil are not exactly matched in
>     functionality.
>
>     dwz is a  program that attempts to optimize DWARF debugging
>     information
>     contained in ELF shared libraries and ELF executables for *size*.
>
>     llvm-dwarfutil is a tool that is used for processing debug
>     info(DWARF) located in built binary files to improve debug info
>     *quality*,
>     reduce debug info *size* and accelerate debug info *processing*.
>
>     Things which are supposed to be done by llvm-dwarfutil and which
>     are not
>     done by dwz: removing obsolete debug info, building indexes,
>     stripping
>     unneeded debug sections, compress/decompress debug sections.
>
>     Common thing is that both of these tools do debug info size
>     reduction.
>     But they do this using different approaches:
>
>     1. dwz reduces the size of debug info by creating partial
>     compilation units
>         for duplicated parts. So that these partial compilation units
>     could be imported
>         in every duplicated place. AFAIU, That optimization gives the
>     most size saving effect.
>
>        another size saving optimization is ODR types deduplication.
>
>     2. llvm-dwarfutil reduces the size of debug info by ODR types
>     deduplication
>        which gives the most size saving effect in llvm-dwarfutil case.
>
>        another size saving optimization is removing obsolete debug info.
>        (which actually is not only about size but about correctness also)
>
>     So, it looks like these tools are not equal. If we would consider
>     that
>     llvm-dwz is an extension of classic dwz then we could probably
>     name it as llvm-dwz.
>
>>
>>     Though I understand the desire for this to grow other
>>     functionality, like DWARF-aware dwp-ing. Might be better for this
>>     to busybox and provide that functionality under llvm-dwp instead,
>>     or more likely I Suspect, that the existing llvm-dwp will be
>>     rewritten (probably by me) to use more of lld's infrastructure to
>>     be more efficient (it's current object reading/writing logic is
>>     using LLVM's libObject and MCStreamer, which is a bit inefficient
>>     for a very content-unaware linking process) and then maybe that
>>     could be taught to use DwarfLinker as a library to optionally do
>>     DWARF-aware linking depending on the users time/space tradeoff
>>     desires. Still benefiting from any improvements to the underlying
>>     DwarfLinker library (at which point that would be shared between
>>     llvm-dsymutil, llvm-dwz, and llvm-dwp).
>>
>>     On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com
>>     <mailto:avl.lapshin at gmail.com>> wrote:
>>
>>         Hi,
>>
>>            We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>>            Any thoughts on this?
>>            Thanks in advance, Alexey.
>>
>>         ======================================================================
>>
>>         llvm-dwarfutil(Apndx A) - is a tool that is used for
>>         processing debug
>>         info(DWARF)
>>         located in built binary files to improve debug info quality,
>>         reduce debug info size and accelerate debug info processing.
>>         Supported object files formats: ELF, MachO(Apndx B),
>>         COFF(Apndx C),
>>         WASM(Apndx C).
>>
>>         ======================================================================
>>
>>         Specifically, the tool would do:
>>
>>            - Remove obsolete debug info which refers to code deleted
>>         by the linker
>>              doing the garbage collection (gc-sections).
>>
>>            - Deduplicate debug type definitions for reducing
>>         resulting size of
>>         binary.
>>
>>            - Build accelerator/index tables.
>>              = .debug_aranges, .debug_names, .gdb_index,
>>         .debug_pubnames,
>>         .debug_pubtypes.
>>
>>            - Strip unneeded tables.
>>              = .debug_aranges, .debug_names, .gdb_index,
>>         .debug_pubnames,
>>         .debug_pubtypes.
>>
>>            - Compress or decompress debug info as requested.
>>
>>         Possible feature:
>>
>>            - Join split dwarf .dwo files in a single file containing
>>         all debug info
>>              (convert split DWARF into monolithic DWARF).
>>
>>         ======================================================================
>>
>>         User interface:
>>
>>            OVERVIEW: A tool for optimizing debug info located in the
>>         built binary.
>>
>>            USAGE: llvm-dwarfutil [options] input output
>>
>>            OPTIONS: (Apndx E)
>>
>>         ======================================================================
>>
>>         Implementation notes:
>>
>>         1. Removing obsolete debug info would be done using
>>         DWARFLinker llvm
>>         library.
>>
>>         2. Data types deduplication would be done using DWARFLinker
>>         llvm library.
>>
>>         3. Accelerator/index tables would be generated using
>>         DWARFLinker llvm
>>         library.
>>
>>         4. Interface of DWARFLinker library would be changed in such
>>         way that it
>>             would be possible to switch on/off various stages:
>>
>>            class DWARFLinker {
>>              setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo =
>>         false);
>>
>>              setDoAppleNames ( bool DoAppleNames = false );
>>              setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>>              setDoAppleTypes ( bool DoAppleTypes = false );
>>              setDoObjC ( bool DoObjC = false );
>>              setDoDebugPubNames ( bool DoDebugPubNames = false );
>>              setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>
>>              setDoDebugNames (bool DoDebugNames = false);
>>              setDoGDBIndex (bool DoGDBIndex = false);
>>            }
>>
>>         5. Copying source file contents, stripping tables,
>>         compressing/decompressing tables
>>             would be done by ObjCopy llvm library(extracted from
>>         llvm-objcopy):
>>
>>            Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                       object::COFFObjectFile &In,
>>         Buffer &Out);
>>            Error executeObjcopyOnBinary(const CopyConfig &Config,
>>         object::ELFObjectFileBase &In, Buffer &Out);
>>            Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                       object::MachOObjectFile &In,
>>         Buffer &Out);
>>            Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                       object::WasmObjectFile &In,
>>         Buffer &Out);
>>
>>         6. Address ranges and single addresses pointing to removed
>>         code should
>>         be marked
>>             with tombstone value in the input file:
>>
>>             -2 for .debug_ranges and .debug_loc.
>>             -1 for other .debug* tables.
>>
>>         7. Prototype implementation - https://reviews.llvm.org/D86539.
>>
>>         ======================================================================
>>
>>         Roadmap:
>>
>>         1. Refactor llvm-objcopy to extract it`s implementation into
>>         separate
>>         library
>>             ObjCopy(in LLVM tree).
>>
>>         2. Create a command line utility using existed DWARFLinker
>>         and ObjCopy
>>             implementation. First version is supposed to work with
>>         only ELF
>>         input object files.
>>             It would take input ELF file with unoptimized debug info
>>         and create
>>         output
>>             ELF file with optimized debug info. That version would be
>>         done out
>>         of the llvm tree.
>>
>>         3. Make a tool to be able to work in multi-thread mode.
>>
>>         4. Consider it to be included into LLVM tree.
>>
>>         5. Support DWARF5 tables.
>>
>>         ======================================================================
>>
>>         Appendix A. Should this tool be implemented as a new tool or
>>         as an extension
>>                      to dsymutil/llvm-objcopy?
>>
>>             There already exists a tool which removes obsolete debug
>>         info on
>>         darwin - dsymutil.
>>             Why create another tool instead of extending the already
>>         existed
>>         dsymutil/llvm-objcopy?
>>
>>             The main functionality of dsymutil is located in a
>>         separate library
>>         - DWARFLinker.
>>             Thus, dsymutil utility is a command-line interface for
>>         DWARFLinker.
>>         dsymutil has
>>             another type of input/output data: it takes several
>>         object files and
>>         address map
>>             as input and creates a .dSYM bundle with linked debug
>>         info as
>>         output. llvm-dwarfutil
>>             would take a built executable as input and create an
>>         optimized
>>         executable as output.
>>             Additionally, there would be many command-line options
>>         specific for
>>         only one utility.
>>             This means that these utilities(implementing command line
>>         interface)
>>         would significantly
>>             differ. It makes sense not to put another command-line
>>         utility
>>         inside existing dsymutil,
>>             but make it as a separate utility. That is the reason why
>>         llvm-dwarfutil suggested to be
>>             implemented not as sub-part of dsymutil but as a separate
>>         tool.
>>
>>             Please share your preference: whether llvm-dwarfutil
>>         should be
>>             separate utility, or a variant of dsymutil compiled for ELF?
>>
>>         ======================================================================
>>
>>         Appendix B. The machO object file format is already supported
>>         by dsymutil.
>>             Depending on the decision whether llvm-dwarfutil would be
>>         done as a
>>         subproject
>>             of dsymutil or as a separate utility - machO would be
>>         supported or not.
>>
>>         ======================================================================
>>
>>         Appendix C. Support for the COFF and WASM object file formats
>>         presented as
>>              possible future improvement. It would be quite easy to
>>         add them
>>         assuming
>>              that llvm-objcopy already supports these formats. It
>>         also would require
>>              supporting DWARF6-suggested tombstone values(-1/-2).
>>
>>         ======================================================================
>>
>>         Appendix D. Documentation.
>>
>>            - proposal for DWARF6 which suggested -1/-2 values for
>>         marking bad
>>         addresses
>>         http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>            - dsymutil tool
>>         https://llvm.org/docs/CommandGuide/dsymutil.html.
>>            - proposal "Remove obsolete debug info in lld."
>>         http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>
>>         ======================================================================
>>
>>         Appendix E. Possible command line options:
>>
>>         DwarfUtil Options:
>>
>>            --build-aranges           - generate .debug_aranges table.
>>            --build-debug-names       - generate .debug_names table.
>>            --build-debug-pubnames    - generate .debug_pubnames table.
>>            --build-debug-pubtypes    - generate .debug_pubtypes table.
>>            --build-gdb-index         - generate .gdb_index table.
>>            --compress                - Compress debug tables.
>>            --decompress              - Decompress debug tables.
>>            --deduplicate-types       - Do ODR deduplication for debug
>>         types.
>>            --garbage-collect         - Do garbage collecting for
>>         debug info.
>>            --num-threads=<n>         - Specify the maximum number (n) of
>>         simultaneous threads
>>                                        to use when optimizing input file.
>>                                        Defaults to the number of
>>         cores on the
>>         current machine.
>>            --strip-all               - Strip all debug tables.
>>            --strip=<name1,name2>     - Strip specified debug info tables.
>>            --strip-unoptimized-debug - Strip all unoptimized debug
>>         tables.
>>            --tombstone=<value>       - Tombstone value used as a
>>         marker of
>>         invalid address.
>>              =bfd                    -   BFD default value
>>              =dwarf6                 -   Dwarf v6.
>>            --verbose                 - Enable verbose logging and
>>         encoding details.
>>
>>         Generic Options:
>>
>>            --help                    - Display available options
>>         (--help-hidden
>>         for more)
>>            --version                 - Display the version of this
>>         program
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200902/befcfa4f/attachment.html>


More information about the llvm-dev mailing list