[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

Alexey via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 2 15:26:32 PDT 2020


On 02.09.2020 21:44, David Blaikie wrote:
>
>
> On Wed, Sep 2, 2020 at 9:56 AM Alexey <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
>     On 01.09.2020 20:07, David Blaikie wrote:
>>     Fair enough - thanks for clarifying the differences! (I'd still
>>     lean a bit towards this being dwz-esque, as you say "an extension
>>     of classic dwz"
>     I doubt a little about "llvm-dwz" since it might confuse people
>     who would expect exactly the same behavior.
>     But if we think of it as "an extension of classic dwz" and the
>     possible confusion is not a big deal then
>     I would be fine with "llvm-dwz".
>>     using a bit more domain knowledge (of terminators and C++ odr -
>>     though I'm not sure dsymutil does rely on the ODR, does it? It
>>     relies on it to know that two names represent the same type, I
>>     suppose, but doesn't assume they're already identical, instead it
>>     merges their members))
>
>     if dsymutil is able to find a full definition then it would remove
>     all other definitions(which matched by name) and set all
>     references to that found definition. If it is not able to find a
>     full definition then it would do nothing. i.e. if there are two
>     incomplete definitions(DW_AT_declaration   (true)) with the same
>     name then they would not be merged. That is a possible improvement
>     - to teach dsymutil to merge incomplete types.
>
> Huh, what does it do with extra member function definitions found in 
> later definitions? (eg: struct x { template<typename T> void f(); }; - 
> in one translation unit x::f<int> is instantiated, in another 
> x::f<float> is instantiated - how are the two represented with dsymutil?)

They would be considered as two not matched types. dsymutil would not 
merge them somehow and thus would not use single type description. There 
would be two separate types called "x" which would have mostly matched 
members but differ with x::f<int> and x::f<float>. No any de-duplication 
in that case.


>     Alexey.
>
>>
>>     But I don't have super strong feelings about the naming.
>>
>>     On Tue, Sep 1, 2020 at 6:36 AM Alexey <avl.lapshin at gmail.com
>>     <mailto:avl.lapshin at gmail.com>> wrote:
>>
>>
>>         On 01.09.2020 06:27, David Blaikie wrote:
>>>         A quick note: The feature as currently proposed sounds like
>>>         it's an exact match for 'dwz'? Is there any benefit to this
>>>         over the existing dwz project? Is it different in some ways
>>>         I'm not aware of? (I haven't actually used dwz, so I might
>>>         have some mistaken ideas about how it should work)
>>>
>>>         If it's going to solve the same general problem, but be in
>>>         the llvm project instead, then maybe it should be called
>>>         llvm-dwz.
>>         It looks like dwz and llvm-dwarfutil are not exactly matched
>>         in functionality.
>>
>>         dwz is a  program that attempts to optimize DWARF debugging
>>         information
>>         contained in ELF shared libraries and ELF executables for *size*.
>>
>>         llvm-dwarfutil is a tool that is used for processing debug
>>         info(DWARF) located in built binary files to improve debug
>>         info *quality*,
>>         reduce debug info *size* and accelerate debug info *processing*.
>>
>>         Things which are supposed to be done by llvm-dwarfutil and
>>         which are not
>>         done by dwz: removing obsolete debug info, building indexes,
>>         stripping
>>         unneeded debug sections, compress/decompress debug sections.
>>
>>         Common thing is that both of these tools do debug info size
>>         reduction.
>>         But they do this using different approaches:
>>
>>         1. dwz reduces the size of debug info by creating partial
>>         compilation units
>>             for duplicated parts. So that these partial compilation
>>         units could be imported
>>             in every duplicated place. AFAIU, That optimization gives
>>         the most size saving effect.
>>
>>            another size saving optimization is ODR types deduplication.
>>
>>         2. llvm-dwarfutil reduces the size of debug info by ODR types
>>         deduplication
>>            which gives the most size saving effect in llvm-dwarfutil
>>         case.
>>
>>            another size saving optimization is removing obsolete
>>         debug info.
>>            (which actually is not only about size but about
>>         correctness also)
>>
>>         So, it looks like these tools are not equal. If we would
>>         consider that
>>         llvm-dwz is an extension of classic dwz then we could probably
>>         name it as llvm-dwz.
>>
>>>
>>>         Though I understand the desire for this to grow other
>>>         functionality, like DWARF-aware dwp-ing. Might be better for
>>>         this to busybox and provide that functionality under
>>>         llvm-dwp instead, or more likely I Suspect, that the
>>>         existing llvm-dwp will be rewritten (probably by me) to use
>>>         more of lld's infrastructure to be more efficient (it's
>>>         current object reading/writing logic is using LLVM's
>>>         libObject and MCStreamer, which is a bit inefficient for a
>>>         very content-unaware linking process) and then maybe that
>>>         could be taught to use DwarfLinker as a library to
>>>         optionally do DWARF-aware linking depending on the users
>>>         time/space tradeoff desires. Still benefiting from any
>>>         improvements to the underlying DwarfLinker library (at which
>>>         point that would be shared between llvm-dsymutil, llvm-dwz,
>>>         and llvm-dwp).
>>>
>>>         On Tue, Aug 25, 2020 at 7:29 AM Alexey
>>>         <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote:
>>>
>>>             Hi,
>>>
>>>                We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>>>                Any thoughts on this?
>>>                Thanks in advance, Alexey.
>>>
>>>             ======================================================================
>>>
>>>             llvm-dwarfutil(Apndx A) - is a tool that is used for
>>>             processing debug
>>>             info(DWARF)
>>>             located in built binary files to improve debug info quality,
>>>             reduce debug info size and accelerate debug info processing.
>>>             Supported object files formats: ELF, MachO(Apndx B),
>>>             COFF(Apndx C),
>>>             WASM(Apndx C).
>>>
>>>             ======================================================================
>>>
>>>             Specifically, the tool would do:
>>>
>>>                - Remove obsolete debug info which refers to code
>>>             deleted by the linker
>>>                  doing the garbage collection (gc-sections).
>>>
>>>                - Deduplicate debug type definitions for reducing
>>>             resulting size of
>>>             binary.
>>>
>>>                - Build accelerator/index tables.
>>>                  = .debug_aranges, .debug_names, .gdb_index,
>>>             .debug_pubnames,
>>>             .debug_pubtypes.
>>>
>>>                - Strip unneeded tables.
>>>                  = .debug_aranges, .debug_names, .gdb_index,
>>>             .debug_pubnames,
>>>             .debug_pubtypes.
>>>
>>>                - Compress or decompress debug info as requested.
>>>
>>>             Possible feature:
>>>
>>>                - Join split dwarf .dwo files in a single file
>>>             containing all debug info
>>>                  (convert split DWARF into monolithic DWARF).
>>>
>>>             ======================================================================
>>>
>>>             User interface:
>>>
>>>                OVERVIEW: A tool for optimizing debug info located in
>>>             the built binary.
>>>
>>>                USAGE: llvm-dwarfutil [options] input output
>>>
>>>                OPTIONS: (Apndx E)
>>>
>>>             ======================================================================
>>>
>>>             Implementation notes:
>>>
>>>             1. Removing obsolete debug info would be done using
>>>             DWARFLinker llvm
>>>             library.
>>>
>>>             2. Data types deduplication would be done using
>>>             DWARFLinker llvm library.
>>>
>>>             3. Accelerator/index tables would be generated using
>>>             DWARFLinker llvm
>>>             library.
>>>
>>>             4. Interface of DWARFLinker library would be changed in
>>>             such way that it
>>>                 would be possible to switch on/off various stages:
>>>
>>>                class DWARFLinker {
>>>                  setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo
>>>             = false);
>>>
>>>                  setDoAppleNames ( bool DoAppleNames = false );
>>>                  setDoAppleNamespaces ( bool DoAppleNamespaces =
>>>             false );
>>>                  setDoAppleTypes ( bool DoAppleTypes = false );
>>>                  setDoObjC ( bool DoObjC = false );
>>>                  setDoDebugPubNames ( bool DoDebugPubNames = false );
>>>                  setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>>
>>>                  setDoDebugNames (bool DoDebugNames = false);
>>>                  setDoGDBIndex (bool DoGDBIndex = false);
>>>                }
>>>
>>>             5. Copying source file contents, stripping tables,
>>>             compressing/decompressing tables
>>>                 would be done by ObjCopy llvm library(extracted from
>>>             llvm-objcopy):
>>>
>>>                Error executeObjcopyOnBinary(const CopyConfig &Config,
>>>             object::COFFObjectFile &In, Buffer &Out);
>>>                Error executeObjcopyOnBinary(const CopyConfig &Config,
>>>             object::ELFObjectFileBase &In, Buffer &Out);
>>>                Error executeObjcopyOnBinary(const CopyConfig &Config,
>>>             object::MachOObjectFile &In, Buffer &Out);
>>>                Error executeObjcopyOnBinary(const CopyConfig &Config,
>>>             object::WasmObjectFile &In, Buffer &Out);
>>>
>>>             6. Address ranges and single addresses pointing to
>>>             removed code should
>>>             be marked
>>>                 with tombstone value in the input file:
>>>
>>>                 -2 for .debug_ranges and .debug_loc.
>>>                 -1 for other .debug* tables.
>>>
>>>             7. Prototype implementation -
>>>             https://reviews.llvm.org/D86539.
>>>
>>>             ======================================================================
>>>
>>>             Roadmap:
>>>
>>>             1. Refactor llvm-objcopy to extract it`s implementation
>>>             into separate
>>>             library
>>>                 ObjCopy(in LLVM tree).
>>>
>>>             2. Create a command line utility using existed
>>>             DWARFLinker and ObjCopy
>>>                 implementation. First version is supposed to work
>>>             with only ELF
>>>             input object files.
>>>                 It would take input ELF file with unoptimized debug
>>>             info and create
>>>             output
>>>                 ELF file with optimized debug info. That version
>>>             would be done out
>>>             of the llvm tree.
>>>
>>>             3. Make a tool to be able to work in multi-thread mode.
>>>
>>>             4. Consider it to be included into LLVM tree.
>>>
>>>             5. Support DWARF5 tables.
>>>
>>>             ======================================================================
>>>
>>>             Appendix A. Should this tool be implemented as a new
>>>             tool or as an extension
>>>                          to dsymutil/llvm-objcopy?
>>>
>>>                 There already exists a tool which removes obsolete
>>>             debug info on
>>>             darwin - dsymutil.
>>>                 Why create another tool instead of extending the
>>>             already existed
>>>             dsymutil/llvm-objcopy?
>>>
>>>                 The main functionality of dsymutil is located in a
>>>             separate library
>>>             - DWARFLinker.
>>>                 Thus, dsymutil utility is a command-line interface
>>>             for DWARFLinker.
>>>             dsymutil has
>>>                 another type of input/output data: it takes several
>>>             object files and
>>>             address map
>>>                 as input and creates a .dSYM bundle with linked
>>>             debug info as
>>>             output. llvm-dwarfutil
>>>                 would take a built executable as input and create an
>>>             optimized
>>>             executable as output.
>>>                 Additionally, there would be many command-line
>>>             options specific for
>>>             only one utility.
>>>                 This means that these utilities(implementing command
>>>             line interface)
>>>             would significantly
>>>                 differ. It makes sense not to put another
>>>             command-line utility
>>>             inside existing dsymutil,
>>>                 but make it as a separate utility. That is the
>>>             reason why
>>>             llvm-dwarfutil suggested to be
>>>                 implemented not as sub-part of dsymutil but as a
>>>             separate tool.
>>>
>>>                 Please share your preference: whether llvm-dwarfutil
>>>             should be
>>>                 separate utility, or a variant of dsymutil compiled
>>>             for ELF?
>>>
>>>             ======================================================================
>>>
>>>             Appendix B. The machO object file format is already
>>>             supported by dsymutil.
>>>                 Depending on the decision whether llvm-dwarfutil
>>>             would be done as a
>>>             subproject
>>>                 of dsymutil or as a separate utility - machO would
>>>             be supported or not.
>>>
>>>             ======================================================================
>>>
>>>             Appendix C. Support for the COFF and WASM object file
>>>             formats presented as
>>>                  possible future improvement. It would be quite easy
>>>             to add them
>>>             assuming
>>>                  that llvm-objcopy already supports these formats.
>>>             It also would require
>>>                  supporting DWARF6-suggested tombstone values(-1/-2).
>>>
>>>             ======================================================================
>>>
>>>             Appendix D. Documentation.
>>>
>>>                - proposal for DWARF6 which suggested -1/-2 values
>>>             for marking bad
>>>             addresses
>>>             http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>>                - dsymutil tool
>>>             https://llvm.org/docs/CommandGuide/dsymutil.html.
>>>                - proposal "Remove obsolete debug info in lld."
>>>             http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>>
>>>             ======================================================================
>>>
>>>             Appendix E. Possible command line options:
>>>
>>>             DwarfUtil Options:
>>>
>>>                --build-aranges           - generate .debug_aranges
>>>             table.
>>>                --build-debug-names       - generate .debug_names table.
>>>                --build-debug-pubnames    - generate .debug_pubnames
>>>             table.
>>>                --build-debug-pubtypes    - generate .debug_pubtypes
>>>             table.
>>>                --build-gdb-index         - generate .gdb_index table.
>>>                --compress                - Compress debug tables.
>>>                --decompress              - Decompress debug tables.
>>>                --deduplicate-types       - Do ODR deduplication for
>>>             debug types.
>>>                --garbage-collect         - Do garbage collecting for
>>>             debug info.
>>>                --num-threads=<n>         - Specify the maximum
>>>             number (n) of
>>>             simultaneous threads
>>>                                            to use when optimizing
>>>             input file.
>>>                                            Defaults to the number of
>>>             cores on the
>>>             current machine.
>>>                --strip-all               - Strip all debug tables.
>>>                --strip=<name1,name2>     - Strip specified debug
>>>             info tables.
>>>                --strip-unoptimized-debug - Strip all unoptimized
>>>             debug tables.
>>>                --tombstone=<value>       - Tombstone value used as a
>>>             marker of
>>>             invalid address.
>>>                  =bfd                    -   BFD default value
>>>                  =dwarf6                 -   Dwarf v6.
>>>                --verbose                 - Enable verbose logging
>>>             and encoding details.
>>>
>>>             Generic Options:
>>>
>>>                --help                    - Display available options
>>>             (--help-hidden
>>>             for more)
>>>                --version                 - Display the version of
>>>             this program
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/7df4f425/attachment.html>


More information about the llvm-dev mailing list