[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Fangrui Song via llvm-dev
llvm-dev at lists.llvm.org
Mon Aug 31 14:54:18 PDT 2020
On 2020-08-31, Alexey via llvm-dev wrote:
>Hi James,
>
>Thank you for the comments.
>
>>I think we're not terribly far from that ideal, now, for ELF. Maybe
>only these three things need to be done? --
>> 1. Teach lld how to emit a separated debuginfo output file
>directly, without requiring an objcopy step.
>> 2. Integrate DWARFLinker into lld.
>> 3. Create a new tool which takes the separated debuginfo and
>DWO/DWP files and uses DWARFLinker library
>> to create a new (dwarf-linked) separated-debug file, that doesn't
>depend on DWO/DWP files.
>
>The three goals which you`ve described are our far goals.
>Indeed, the best solution would be to create valid optimized debug
>info without additional
>stages and additional modifications of resulting binaries.
>
>There was an attempt to use DWARFLinker from the lld -
>https://reviews.llvm.org/D74169
>It did not receive enough support to be integrated yet. There are fair
>reasons for that:
>
>1. Execution time. The time required by DWARFLinker for processing
>clang binary is 8x bigger
>than the usual linking time. Linking clang binary with DWARFLinker
>takes 72 sec,
>linking with the only lld takes 9 sec.
>
>2. "Removing obsolete debug info" could not be switched off. Thus, lld
>could not use DWARFLinker for
>other tasks(like generation of index tables - .gdb_index,
>.debug_names) without significant performance
>degradation.
>
>3. DWARFLinker does not support split dwarf at the moment.
>
>All these reasons are not blockers. And I believe implementation from
>D74169 might be integrated and
>incrementally improved if there would be agreement on that.
>
>Using DWARFLinker from llvm-dwarfutil is another possibility to use
>and improve it.
>When finally implemented - llvm-dwarfutil should solve the above three
>issues and there
>would probably be more reasons to include DWARFLinker into lld.
>
>Even if we would have the best solution - it is still useful to have a
>tool like llvm-dwarfutil
>for cases when it is necessary to process already created binaries.
>
>So in short, the suggested tool - llvm-dwarfutil - is a step towards
>the ideal solution.
>Its benefit is that it could be used until we created the best
>solution or for cases
>where "the best solution" is not applicable.
>
>Thank you, Alexey.
>
>
>On 29.08.2020 00:23, James Y Knight wrote:
>>If we're designing a new tool and process, it would be wonderful if
>>it did not require multiple stages of copying and slightly modifying
>>the binary, in order to create final output with separate
>>debug info. It seems to me that the variants of this sort of thing
>>which exist today are somewhat suboptimal.
>>
>>With Mach-O and dsymutil:
>> 1. Given a collection of object files (which contain debuginfo),
>>link a binary with ld. The binary then includes special references
>>to the object files that were actually used as part of the link.
>> 2. Given the linked binary, and all of the same object files, link
>>the debuginfo with dsymutil.
>> 3. Strip the references to the object file paths from the binary.
>> Finally, you have a binary without debug info, and a dsym
>>debuginfo file. But it would be better if the binary created in step
>>1 didn't need to include the extraneous object-file path info, and
>>that was instead emitted in a second file. Then we wouldn't need
>>step 3.
>>
>>With "normal" ELF:
>> 1. Given a collection of object files (which contain debuginfo),
>>link a binary with ld, which includes linking all the debug info
>>into the binary.
>> 2. Given the linked binary, objcopy --only-keep-debug to create a
>>new separated debug file.
>> 3. Given the linked binary, objcopy --strip-debug to create a copy
>>of the binary without debug info.
>> Finally you have a binary without debug info, and a separate debug
>>file. But it would be better if the linker could just write the
>>debug info into a separate file in the first place, then we'd only
>>have the one step. (But, downside, the linker needs to manage all
>>the debug info, which can be excessively large.)
>>
>>With "split-dwarf" ELF support:
>> 1. Given object files (which exclude /most/ but not all of the
>>debuginfo), link a binary. The binary will include that smaller set
>>of debug info.
>> 2. Given the collection of dwo files corresponding to the object
>>files, run the "dwp" tool to create a dwp file.
>> 3. objcopy --only-keep-debug
>> 4. --strip-debug
>> And then you need to keep both a debug file /and/ a dwp file,
>>which is weird.
>>
>>
>>I think, ideally, users would have the following three /good/ options:
>> Easy option: store debuginfo in the object files, and have the
>>linker create a pair of {binary, separated dwarf-optimized
>>debuginfo} files directly from the object files.
>> More scalable option: emit (most of the) debuginfo in separate
>>*.dwo files using -gsplit-dwarf, and then,
>> 1. run the linker on the object files to create a pair of
>>{binary, separated debuginfo} files. In this case the latter file
>>contains the minimal debuginfo which was in the object files.
>> 2. run a second tool, which reads the minimal debuginfo from
>>above, and all the DWO files, and creates a full
>>optimized/deduplicated debuginfo output file.
>> Faster developer builds: Like previous, but omit step 2 -- running
>>the debugger directly after step 1 can use the dwo files on-disk.
>>
>>I think we're not terribly far from that ideal, now, for ELF. Maybe
>>only these three things need to be done? --
>> 1. Teach lld how to emit a separated debuginfo output file
>>directly, without requiring an objcopy step.
This is very similar to Solaris's ancillary objects (ET_SUNW_ANCILLARY).
There are more details on http://www.linker-aliens.org/blogs/ali/entry/ancillary_objects_separate_debug_elf/
In short, Solari's `ld -z ancillary[=outfile]` writes non-SHF_ALLOC sections to the
ancillary object. Perhaps we will need some coordination with GNU. Some
GNU folks are interested in a new object file type:
https://groups.google.com/forum/#!topic/generic-abi/tJq7anc6WKs
A debug file created by {,llvm-}objcopy --only-keep-debug has different
contents (see https://reviews.llvm.org/D67137 for details):
non-SHF_ALLOC sections and SHT_NOTE sections. http://www.linker-aliens.org/blogs/ali/entry/ancillary_objects_separate_debug_elf/
does not say whether program headers are retained in the debug file, but
{,llvm-}objcopy --only-keep-debug keeps one copy (neither gdb/lldb needs
the program headers).
>> 2. Integrate DWARFLinker into lld.
>> 3. Create a new tool which takes the separated debuginfo and
>>DWO/DWP files and uses DWARFLinker library to create a new
>>(dwarf-linked) separated-debug file, that doesn't depend on DWO/DWP
>>files.
>>
>>My hope is that the tool you're creating will be the implementation
>>of #3, but I'm afraid the intent is for this tool to be an
>>additional stage that non-split-dwarf users would need to run
>>post-link, /instead of/ integrating DWARFLinker into lld.
>>On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev
>><llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Hi,
>>
>> We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>> Any thoughts on this?
>> Thanks in advance, Alexey.
>>
>> ======================================================================
>>
>> llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>> info(DWARF)
>> located in built binary files to improve debug info quality,
>> reduce debug info size and accelerate debug info processing.
>> Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>> WASM(Apndx C).
>>
>> ======================================================================
>>
>> Specifically, the tool would do:
>>
>> - Remove obsolete debug info which refers to code deleted by
>> the linker
>> doing the garbage collection (gc-sections).
>>
>> - Deduplicate debug type definitions for reducing resulting
>> size of
>> binary.
>>
>> - Build accelerator/index tables.
>> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>> .debug_pubtypes.
>>
>> - Strip unneeded tables.
>> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>> .debug_pubtypes.
>>
>> - Compress or decompress debug info as requested.
>>
>> Possible feature:
>>
>> - Join split dwarf .dwo files in a single file containing all
>> debug info
>> (convert split DWARF into monolithic DWARF).
>>
>> ======================================================================
>>
>> User interface:
>>
>> OVERVIEW: A tool for optimizing debug info located in the built
>> binary.
>>
>> USAGE: llvm-dwarfutil [options] input output
>>
>> OPTIONS: (Apndx E)
>>
>> ======================================================================
>>
>> Implementation notes:
>>
>> 1. Removing obsolete debug info would be done using DWARFLinker llvm
>> library.
>>
>> 2. Data types deduplication would be done using DWARFLinker llvm
>> library.
>>
>> 3. Accelerator/index tables would be generated using DWARFLinker llvm
>> library.
>>
>> 4. Interface of DWARFLinker library would be changed in such way
>> that it
>> would be possible to switch on/off various stages:
>>
>> class DWARFLinker {
>> setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>>
>> setDoAppleNames ( bool DoAppleNames = false );
>> setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>> setDoAppleTypes ( bool DoAppleTypes = false );
>> setDoObjC ( bool DoObjC = false );
>> setDoDebugPubNames ( bool DoDebugPubNames = false );
>> setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>
>> setDoDebugNames (bool DoDebugNames = false);
>> setDoGDBIndex (bool DoGDBIndex = false);
>> }
>>
>> 5. Copying source file contents, stripping tables,
>> compressing/decompressing tables
>> would be done by ObjCopy llvm library(extracted from
>> llvm-objcopy):
>>
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::COFFObjectFile &In, Buffer
>> &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::ELFObjectFileBase &In,
>> Buffer &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::MachOObjectFile &In, Buffer
>> &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::WasmObjectFile &In, Buffer
>> &Out);
>>
>> 6. Address ranges and single addresses pointing to removed code
>> should
>> be marked
>> with tombstone value in the input file:
>>
>> -2 for .debug_ranges and .debug_loc.
>> -1 for other .debug* tables.
>>
>> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>>
>> ======================================================================
>>
>> Roadmap:
>>
>> 1. Refactor llvm-objcopy to extract it`s implementation into separate
>> library
>> ObjCopy(in LLVM tree).
>>
>> 2. Create a command line utility using existed DWARFLinker and ObjCopy
>> implementation. First version is supposed to work with only ELF
>> input object files.
>> It would take input ELF file with unoptimized debug info and
>> create
>> output
>> ELF file with optimized debug info. That version would be done
>> out
>> of the llvm tree.
>>
>> 3. Make a tool to be able to work in multi-thread mode.
>>
>> 4. Consider it to be included into LLVM tree.
>>
>> 5. Support DWARF5 tables.
>>
>> ======================================================================
>>
>> Appendix A. Should this tool be implemented as a new tool or as an
>> extension
>> to dsymutil/llvm-objcopy?
>>
>> There already exists a tool which removes obsolete debug info on
>> darwin - dsymutil.
>> Why create another tool instead of extending the already existed
>> dsymutil/llvm-objcopy?
>>
>> The main functionality of dsymutil is located in a separate
>> library
>> - DWARFLinker.
>> Thus, dsymutil utility is a command-line interface for
>> DWARFLinker.
>> dsymutil has
>> another type of input/output data: it takes several object
>> files and
>> address map
>> as input and creates a .dSYM bundle with linked debug info as
>> output. llvm-dwarfutil
>> would take a built executable as input and create an optimized
>> executable as output.
>> Additionally, there would be many command-line options
>> specific for
>> only one utility.
>> This means that these utilities(implementing command line
>> interface)
>> would significantly
>> differ. It makes sense not to put another command-line utility
>> inside existing dsymutil,
>> but make it as a separate utility. That is the reason why
>> llvm-dwarfutil suggested to be
>> implemented not as sub-part of dsymutil but as a separate tool.
>>
>> Please share your preference: whether llvm-dwarfutil should be
>> separate utility, or a variant of dsymutil compiled for ELF?
>>
>> ======================================================================
>>
>> Appendix B. The machO object file format is already supported by
>> dsymutil.
>> Depending on the decision whether llvm-dwarfutil would be done
>> as a
>> subproject
>> of dsymutil or as a separate utility - machO would be
>> supported or not.
>>
>> ======================================================================
>>
>> Appendix C. Support for the COFF and WASM object file formats
>> presented as
>> possible future improvement. It would be quite easy to add them
>> assuming
>> that llvm-objcopy already supports these formats. It also
>> would require
>> supporting DWARF6-suggested tombstone values(-1/-2).
>>
>> ======================================================================
>>
>> Appendix D. Documentation.
>>
>> - proposal for DWARF6 which suggested -1/-2 values for marking bad
>> addresses
>> http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>> - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>> - proposal "Remove obsolete debug info in lld."
>> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>
>> ======================================================================
>>
>> Appendix E. Possible command line options:
>>
>> DwarfUtil Options:
>>
>> --build-aranges - generate .debug_aranges table.
>> --build-debug-names - generate .debug_names table.
>> --build-debug-pubnames - generate .debug_pubnames table.
>> --build-debug-pubtypes - generate .debug_pubtypes table.
>> --build-gdb-index - generate .gdb_index table.
>> --compress - Compress debug tables.
>> --decompress - Decompress debug tables.
>> --deduplicate-types - Do ODR deduplication for debug types.
>> --garbage-collect - Do garbage collecting for debug info.
>> --num-threads=<n> - Specify the maximum number (n) of
>> simultaneous threads
>> to use when optimizing input file.
>> Defaults to the number of cores on the
>> current machine.
>> --strip-all - Strip all debug tables.
>> --strip=<name1,name2> - Strip specified debug info tables.
>> --strip-unoptimized-debug - Strip all unoptimized debug tables.
>> --tombstone=<value> - Tombstone value used as a marker of
>> invalid address.
>> =bfd - BFD default value
>> =dwarf6 - Dwarf v6.
>> --verbose - Enable verbose logging and encoding
>> details.
>>
>> Generic Options:
>>
>> --help - Display available options
>> (--help-hidden
>> for more)
>> --version - Display the version of this program
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list