[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

James Y Knight via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 28 14:23:38 PDT 2020


If we're designing a new tool and process, it would be wonderful if it did
not require multiple stages of copying and slightly modifying the binary,
in order to create final output with separate debug info. It seems to me
that the variants of this sort of thing which exist today are somewhat
suboptimal.

With Mach-O and dsymutil:
  1. Given a collection of object files (which contain debuginfo), link a
binary with ld. The binary then includes special references to the object
files that were actually used as part of the link.
  2. Given the linked binary, and all of the same object files, link the
debuginfo with dsymutil.
  3. Strip the references to the object file paths from the binary.
  Finally, you have a binary without debug info, and a dsym debuginfo file.
But it would be better if the binary created in step 1 didn't need to
include the extraneous object-file path info, and that was instead emitted
in a second file. Then we wouldn't need step 3.

With "normal" ELF:
  1. Given a collection of object files (which contain debuginfo), link a
binary with ld, which includes linking all the debug info into the binary.
  2. Given the linked binary, objcopy --only-keep-debug to create a new
separated debug file.
  3. Given the linked binary, objcopy --strip-debug to create a copy of the
binary without debug info.
  Finally you have a binary without debug info, and a separate debug file.
But it would be better if the linker could just write the debug info into a
separate file in the first place, then we'd only have the one step. (But,
downside, the linker needs to manage all the debug info, which can be
excessively large.)

With "split-dwarf" ELF support:
  1. Given object files (which exclude *most* but not all of the
debuginfo), link a binary. The binary will include that smaller set of
debug info.
  2. Given the collection of dwo files corresponding to the object
files, run the "dwp" tool to create a dwp file.
  3. objcopy --only-keep-debug
  4. --strip-debug
  And then you need to keep both a debug file *and* a dwp file, which is
weird.


I think, ideally, users would have the following three *good* options:
  Easy option: store debuginfo in the object files, and have the linker
create a pair of {binary, separated dwarf-optimized debuginfo} files
directly from the object files.
  More scalable option: emit (most of the) debuginfo in separate *.dwo
files using -gsplit-dwarf, and then,
    1. run the linker on the object files to create a pair of {binary,
separated debuginfo} files. In this case the latter file contains the
minimal debuginfo which was in the object files.
    2. run a second tool, which reads the minimal debuginfo from above, and
all the DWO files, and creates a full optimized/deduplicated debuginfo
output file.
  Faster developer builds: Like previous, but omit step 2 -- running the
debugger directly after step 1 can use the dwo files on-disk.

I think we're not terribly far from that ideal, now, for ELF. Maybe only
these three things need to be done? --
  1. Teach lld how to emit a separated debuginfo output file directly,
without requiring an objcopy step.
  2. Integrate DWARFLinker into lld.
  3. Create a new tool which takes the separated debuginfo and DWO/DWP
files and uses DWARFLinker library to create a new (dwarf-linked)
separated-debug file, that doesn't depend on DWO/DWP files.

My hope is that the tool you're creating will be the implementation of #3,
but I'm afraid the intent is for this tool to be an additional stage that
non-split-dwarf users would need to run post-link, *instead of* integrating
DWARFLinker into lld.

On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi,
>
>    We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>    Any thoughts on this?
>    Thanks in advance, Alexey.
>
> ======================================================================
>
> llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
> info(DWARF)
> located in built binary files to improve debug info quality,
> reduce debug info size and accelerate debug info processing.
> Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
> WASM(Apndx C).
>
> ======================================================================
>
> Specifically, the tool would do:
>
>    - Remove obsolete debug info which refers to code deleted by the linker
>      doing the garbage collection (gc-sections).
>
>    - Deduplicate debug type definitions for reducing resulting size of
> binary.
>
>    - Build accelerator/index tables.
>      = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
> .debug_pubtypes.
>
>    - Strip unneeded tables.
>      = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
> .debug_pubtypes.
>
>    - Compress or decompress debug info as requested.
>
> Possible feature:
>
>    - Join split dwarf .dwo files in a single file containing all debug info
>      (convert split DWARF into monolithic DWARF).
>
> ======================================================================
>
> User interface:
>
>    OVERVIEW: A tool for optimizing debug info located in the built binary.
>
>    USAGE: llvm-dwarfutil [options] input output
>
>    OPTIONS: (Apndx E)
>
> ======================================================================
>
> Implementation notes:
>
> 1. Removing obsolete debug info would be done using DWARFLinker llvm
> library.
>
> 2. Data types deduplication would be done using DWARFLinker llvm library.
>
> 3. Accelerator/index tables would be generated using DWARFLinker llvm
> library.
>
> 4. Interface of DWARFLinker library would be changed in such way that it
>     would be possible to switch on/off various stages:
>
>    class DWARFLinker {
>      setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>
>      setDoAppleNames ( bool DoAppleNames = false );
>      setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>      setDoAppleTypes ( bool DoAppleTypes = false );
>      setDoObjC ( bool DoObjC = false );
>      setDoDebugPubNames ( bool DoDebugPubNames = false );
>      setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>
>      setDoDebugNames (bool DoDebugNames = false);
>      setDoGDBIndex (bool DoGDBIndex = false);
>    }
>
> 5. Copying source file contents, stripping tables,
> compressing/decompressing tables
>     would be done by ObjCopy llvm library(extracted from llvm-objcopy):
>
>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>                               object::COFFObjectFile &In, Buffer &Out);
>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>                               object::ELFObjectFileBase &In, Buffer &Out);
>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>                               object::MachOObjectFile &In, Buffer &Out);
>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>                               object::WasmObjectFile &In, Buffer &Out);
>
> 6. Address ranges and single addresses pointing to removed code should
> be marked
>     with tombstone value in the input file:
>
>     -2 for .debug_ranges and .debug_loc.
>     -1 for other .debug* tables.
>
> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>
> ======================================================================
>
> Roadmap:
>
> 1. Refactor llvm-objcopy to extract it`s implementation into separate
> library
>     ObjCopy(in LLVM tree).
>
> 2. Create a command line utility using existed DWARFLinker and ObjCopy
>     implementation. First version is supposed to work with only ELF
> input object files.
>     It would take input ELF file with unoptimized debug info and create
> output
>     ELF file with optimized debug info. That version would be done out
> of the llvm tree.
>
> 3. Make a tool to be able to work in multi-thread mode.
>
> 4. Consider it to be included into LLVM tree.
>
> 5. Support DWARF5 tables.
>
> ======================================================================
>
> Appendix A. Should this tool be implemented as a new tool or as an
> extension
>              to dsymutil/llvm-objcopy?
>
>     There already exists a tool which removes obsolete debug info on
> darwin - dsymutil.
>     Why create another tool instead of extending the already existed
> dsymutil/llvm-objcopy?
>
>     The main functionality of dsymutil is located in a separate library
> - DWARFLinker.
>     Thus, dsymutil utility is a command-line interface for DWARFLinker.
> dsymutil has
>     another type of input/output data: it takes several object files and
> address map
>     as input and creates a .dSYM bundle with linked debug info as
> output. llvm-dwarfutil
>     would take a built executable as input and create an optimized
> executable as output.
>     Additionally, there would be many command-line options specific for
> only one utility.
>     This means that these utilities(implementing command line interface)
> would significantly
>     differ. It makes sense not to put another command-line utility
> inside existing dsymutil,
>     but make it as a separate utility. That is the reason why
> llvm-dwarfutil suggested to be
>     implemented not as sub-part of dsymutil but as a separate tool.
>
>     Please share your preference: whether llvm-dwarfutil should be
>     separate utility, or a variant of dsymutil compiled for ELF?
>
> ======================================================================
>
> Appendix B. The machO object file format is already supported by dsymutil.
>     Depending on the decision whether llvm-dwarfutil would be done as a
> subproject
>     of dsymutil or as a separate utility - machO would be supported or not.
>
> ======================================================================
>
> Appendix C. Support for the COFF and WASM object file formats presented as
>      possible future improvement. It would be quite easy to add them
> assuming
>      that llvm-objcopy already supports these formats. It also would
> require
>      supporting DWARF6-suggested tombstone values(-1/-2).
>
> ======================================================================
>
> Appendix D. Documentation.
>
>    - proposal for DWARF6 which suggested -1/-2 values for marking bad
> addresses
>      http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>    - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>    - proposal "Remove obsolete debug info in lld."
> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>
> ======================================================================
>
> Appendix E. Possible command line options:
>
> DwarfUtil Options:
>
>    --build-aranges           - generate .debug_aranges table.
>    --build-debug-names       - generate .debug_names table.
>    --build-debug-pubnames    - generate .debug_pubnames table.
>    --build-debug-pubtypes    - generate .debug_pubtypes table.
>    --build-gdb-index         - generate .gdb_index table.
>    --compress                - Compress debug tables.
>    --decompress              - Decompress debug tables.
>    --deduplicate-types       - Do ODR deduplication for debug types.
>    --garbage-collect         - Do garbage collecting for debug info.
>    --num-threads=<n>         - Specify the maximum number (n) of
> simultaneous threads
>                                to use when optimizing input file.
>                                Defaults to the number of cores on the
> current machine.
>    --strip-all               - Strip all debug tables.
>    --strip=<name1,name2>     - Strip specified debug info tables.
>    --strip-unoptimized-debug - Strip all unoptimized debug tables.
>    --tombstone=<value>       - Tombstone value used as a marker of
> invalid address.
>      =bfd                    -   BFD default value
>      =dwarf6                 -   Dwarf v6.
>    --verbose                 - Enable verbose logging and encoding details.
>
> Generic Options:
>
>    --help                    - Display available options (--help-hidden
> for more)
>    --version                 - Display the version of this program
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200828/c0ebd72a/attachment.html>


More information about the llvm-dev mailing list