[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Mon Aug 31 20:24:15 PDT 2020

On Fri, Aug 28, 2020 at 2:24 PM James Y Knight <jyknight at google.com> wrote:

> If we're designing a new tool and process, it would be wonderful if it did
> not require multiple stages of copying and slightly modifying the binary,
> in order to create final output with separate debug info. It seems to me
> that the variants of this sort of thing which exist today are somewhat
> suboptimal.
> With Mach-O and dsymutil:
>   1. Given a collection of object files (which contain debuginfo), link a
> binary with ld. The binary then includes special references to the object
> files that were actually used as part of the link.
>   2. Given the linked binary, and all of the same object files, link the
> debuginfo with dsymutil.
>   3. Strip the references to the object file paths from the binary.
>   Finally, you have a binary without debug info, and a dsym debuginfo
> file. But it would be better if the binary created in step 1 didn't need to
> include the extraneous object-file path info, and that was instead emitted
> in a second file. Then we wouldn't need step 3.
> With "normal" ELF:
>   1. Given a collection of object files (which contain debuginfo), link a
> binary with ld, which includes linking all the debug info into the binary.
>   2. Given the linked binary, objcopy --only-keep-debug to create a new
> separated debug file.
>   3. Given the linked binary, objcopy --strip-debug to create a copy of
> the binary without debug info.
>   Finally you have a binary without debug info, and a separate debug file.
> But it would be better if the linker could just write the debug info into a
> separate file in the first place, then we'd only have the one step. (But,
> downside, the linker needs to manage all the debug info, which can be
> excessively large.)
> With "split-dwarf" ELF support:
>   1. Given object files (which exclude *most* but not all of the
> debuginfo), link a binary. The binary will include that smaller set of
> debug info.
>   2. Given the collection of dwo files corresponding to the object
> files, run the "dwp" tool to create a dwp file.
>   3. objcopy --only-keep-debug
>   4. --strip-debug
>   And then you need to keep both a debug file *and* a dwp file, which is
> weird.
> I think, ideally, users would have the following three *good* options:
>   Easy option: store debuginfo in the object files, and have the linker
> create a pair of {binary, separated dwarf-optimized debuginfo} files
> directly from the object files.

(as discussed by other replies - that was an early proposal, didn't gain a
lot of traction/Eric & Ray weren't super convinced it was worth adding to
lld at this stage, given the link time cost & thus the small expected user

>   More scalable option: emit (most of the) debuginfo in separate *.dwo
> files using -gsplit-dwarf, and then,
>     1. run the linker on the object files to create a pair of {binary,
> separated debuginfo} files. In this case the latter file contains the
> minimal debuginfo which was in the object files.

Yeah, that ^ is probably a nice feature regardless. Save folks an extra
objcopy, etc. Usable right now for any build that is already running

>     2. run a second tool, which reads the minimal debuginfo from above,
> and all the DWO files, and creates a full optimized/deduplicated debuginfo
> output file.

Fair - this then looks a lot like the MachO debug info distribution/linking
model (with the advantage that the DWARF isn't in the .o files, so doesn't
have to be shipped to the machine doing the linking), so far as I know.

>   Faster developer builds: Like previous, but omit step 2 -- running the
> debugger directly after step 1 can use the dwo files on-disk.
> I think we're not terribly far from that ideal, now, for ELF. Maybe only
> these three things need to be done? --
>   1. Teach lld how to emit a separated debuginfo output file directly,
> without requiring an objcopy step.
>   2. Integrate DWARFLinker into lld.
>   3. Create a new tool which takes the separated debuginfo and DWO/DWP
> files and uses DWARFLinker library to create a new (dwarf-linked)
> separated-debug file, that doesn't depend on DWO/DWP files.
> My hope is that the tool you're creating will be the implementation of #3,
> but I'm afraid the intent is for this tool to be an additional stage that
> non-split-dwarf users would need to run post-link, *instead of*
> integrating DWARFLinker into lld.

Yeah, that's the direction lld folks have pushed for - a post-processing,
rather than link-time. Mostly due to the current performance of DWARF-aware
linking being quite slow, so the idea that not many users would be willing
to take that link-time performance hit to use the feature. (whereas as a
post-processing step before archiving DWARF (like building a dwp from dwo
files) it might be more appealing/interesting - and maybe with sufficient
performance improvements, could then be rolled into lld as originally

Curiously Alexey's needs include not wanting to use fission because a
single debuggable binary simplifies his users use-case/makes it easier to
distribute than two files. So he's probably not interested in the
strip-debug/only-keep-debug kind of debug info distribution model, at least
for his own users/use case. So far as I understand it.

I've got mixed feelings about that - and encourage you to
express/clarify/discuss your thoughts here, as I think the whole
conversation could use some more voices.

- Dave

> On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>> Hi,
>>    We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>>    Any thoughts on this?
>>    Thanks in advance, Alexey.
>> ======================================================================
>> llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>> info(DWARF)
>> located in built binary files to improve debug info quality,
>> reduce debug info size and accelerate debug info processing.
>> Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>> WASM(Apndx C).
>> ======================================================================
>> Specifically, the tool would do:
>>    - Remove obsolete debug info which refers to code deleted by the linker
>>      doing the garbage collection (gc-sections).
>>    - Deduplicate debug type definitions for reducing resulting size of
>> binary.
>>    - Build accelerator/index tables.
>>      = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>> .debug_pubtypes.
>>    - Strip unneeded tables.
>>      = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>> .debug_pubtypes.
>>    - Compress or decompress debug info as requested.
>> Possible feature:
>>    - Join split dwarf .dwo files in a single file containing all debug
>> info
>>      (convert split DWARF into monolithic DWARF).
>> ======================================================================
>> User interface:
>>    OVERVIEW: A tool for optimizing debug info located in the built binary.
>>    USAGE: llvm-dwarfutil [options] input output
>>    OPTIONS: (Apndx E)
>> ======================================================================
>> Implementation notes:
>> 1. Removing obsolete debug info would be done using DWARFLinker llvm
>> library.
>> 2. Data types deduplication would be done using DWARFLinker llvm library.
>> 3. Accelerator/index tables would be generated using DWARFLinker llvm
>> library.
>> 4. Interface of DWARFLinker library would be changed in such way that it
>>     would be possible to switch on/off various stages:
>>    class DWARFLinker {
>>      setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>>      setDoAppleNames ( bool DoAppleNames = false );
>>      setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>>      setDoAppleTypes ( bool DoAppleTypes = false );
>>      setDoObjC ( bool DoObjC = false );
>>      setDoDebugPubNames ( bool DoDebugPubNames = false );
>>      setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>      setDoDebugNames (bool DoDebugNames = false);
>>      setDoGDBIndex (bool DoGDBIndex = false);
>>    }
>> 5. Copying source file contents, stripping tables,
>> compressing/decompressing tables
>>     would be done by ObjCopy llvm library(extracted from llvm-objcopy):
>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                               object::COFFObjectFile &In, Buffer &Out);
>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                               object::ELFObjectFileBase &In, Buffer &Out);
>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                               object::MachOObjectFile &In, Buffer &Out);
>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                               object::WasmObjectFile &In, Buffer &Out);
>> 6. Address ranges and single addresses pointing to removed code should
>> be marked
>>     with tombstone value in the input file:
>>     -2 for .debug_ranges and .debug_loc.
>>     -1 for other .debug* tables.
>> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>> ======================================================================
>> Roadmap:
>> 1. Refactor llvm-objcopy to extract it`s implementation into separate
>> library
>>     ObjCopy(in LLVM tree).
>> 2. Create a command line utility using existed DWARFLinker and ObjCopy
>>     implementation. First version is supposed to work with only ELF
>> input object files.
>>     It would take input ELF file with unoptimized debug info and create
>> output
>>     ELF file with optimized debug info. That version would be done out
>> of the llvm tree.
>> 3. Make a tool to be able to work in multi-thread mode.
>> 4. Consider it to be included into LLVM tree.
>> 5. Support DWARF5 tables.
>> ======================================================================
>> Appendix A. Should this tool be implemented as a new tool or as an
>> extension
>>              to dsymutil/llvm-objcopy?
>>     There already exists a tool which removes obsolete debug info on
>> darwin - dsymutil.
>>     Why create another tool instead of extending the already existed
>> dsymutil/llvm-objcopy?
>>     The main functionality of dsymutil is located in a separate library
>> - DWARFLinker.
>>     Thus, dsymutil utility is a command-line interface for DWARFLinker.
>> dsymutil has
>>     another type of input/output data: it takes several object files and
>> address map
>>     as input and creates a .dSYM bundle with linked debug info as
>> output. llvm-dwarfutil
>>     would take a built executable as input and create an optimized
>> executable as output.
>>     Additionally, there would be many command-line options specific for
>> only one utility.
>>     This means that these utilities(implementing command line interface)
>> would significantly
>>     differ. It makes sense not to put another command-line utility
>> inside existing dsymutil,
>>     but make it as a separate utility. That is the reason why
>> llvm-dwarfutil suggested to be
>>     implemented not as sub-part of dsymutil but as a separate tool.
>>     Please share your preference: whether llvm-dwarfutil should be
>>     separate utility, or a variant of dsymutil compiled for ELF?
>> ======================================================================
>> Appendix B. The machO object file format is already supported by dsymutil.
>>     Depending on the decision whether llvm-dwarfutil would be done as a
>> subproject
>>     of dsymutil or as a separate utility - machO would be supported or
>> not.
>> ======================================================================
>> Appendix C. Support for the COFF and WASM object file formats presented as
>>      possible future improvement. It would be quite easy to add them
>> assuming
>>      that llvm-objcopy already supports these formats. It also would
>> require
>>      supporting DWARF6-suggested tombstone values(-1/-2).
>> ======================================================================
>> Appendix D. Documentation.
>>    - proposal for DWARF6 which suggested -1/-2 values for marking bad
>> addresses
>>      http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>    - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>>    - proposal "Remove obsolete debug info in lld."
>> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>> ======================================================================
>> Appendix E. Possible command line options:
>> DwarfUtil Options:
>>    --build-aranges           - generate .debug_aranges table.
>>    --build-debug-names       - generate .debug_names table.
>>    --build-debug-pubnames    - generate .debug_pubnames table.
>>    --build-debug-pubtypes    - generate .debug_pubtypes table.
>>    --build-gdb-index         - generate .gdb_index table.
>>    --compress                - Compress debug tables.
>>    --decompress              - Decompress debug tables.
>>    --deduplicate-types       - Do ODR deduplication for debug types.
>>    --garbage-collect         - Do garbage collecting for debug info.
>>    --num-threads=<n>         - Specify the maximum number (n) of
>> simultaneous threads
>>                                to use when optimizing input file.
>>                                Defaults to the number of cores on the
>> current machine.
>>    --strip-all               - Strip all debug tables.
>>    --strip=<name1,name2>     - Strip specified debug info tables.
>>    --strip-unoptimized-debug - Strip all unoptimized debug tables.
>>    --tombstone=<value>       - Tombstone value used as a marker of
>> invalid address.
>>      =bfd                    -   BFD default value
>>      =dwarf6                 -   Dwarf v6.
>>    --verbose                 - Enable verbose logging and encoding
>> details.
>> Generic Options:
>>    --help                    - Display available options (--help-hidden
>> for more)
>>    --version                 - Display the version of this program
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200831/996abee3/attachment.html>

More information about the llvm-dev mailing list