[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Alexey via llvm-dev
llvm-dev at lists.llvm.org
Tue Aug 25 07:29:18 PDT 2020
Hi,
We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
Any thoughts on this?
Thanks in advance, Alexey.
======================================================================
llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
info(DWARF)
located in built binary files to improve debug info quality,
reduce debug info size and accelerate debug info processing.
Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
WASM(Apndx C).
======================================================================
Specifically, the tool would do:
- Remove obsolete debug info which refers to code deleted by the linker
doing the garbage collection (gc-sections).
- Deduplicate debug type definitions for reducing resulting size of
binary.
- Build accelerator/index tables.
= .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
.debug_pubtypes.
- Strip unneeded tables.
= .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
.debug_pubtypes.
- Compress or decompress debug info as requested.
Possible feature:
- Join split dwarf .dwo files in a single file containing all debug info
(convert split DWARF into monolithic DWARF).
======================================================================
User interface:
OVERVIEW: A tool for optimizing debug info located in the built binary.
USAGE: llvm-dwarfutil [options] input output
OPTIONS: (Apndx E)
======================================================================
Implementation notes:
1. Removing obsolete debug info would be done using DWARFLinker llvm
library.
2. Data types deduplication would be done using DWARFLinker llvm library.
3. Accelerator/index tables would be generated using DWARFLinker llvm
library.
4. Interface of DWARFLinker library would be changed in such way that it
would be possible to switch on/off various stages:
class DWARFLinker {
setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
setDoAppleNames ( bool DoAppleNames = false );
setDoAppleNamespaces ( bool DoAppleNamespaces = false );
setDoAppleTypes ( bool DoAppleTypes = false );
setDoObjC ( bool DoObjC = false );
setDoDebugPubNames ( bool DoDebugPubNames = false );
setDoDebugPubTypes ( bool DoDebugPubTypes = false );
setDoDebugNames (bool DoDebugNames = false);
setDoGDBIndex (bool DoGDBIndex = false);
}
5. Copying source file contents, stripping tables,
compressing/decompressing tables
would be done by ObjCopy llvm library(extracted from llvm-objcopy):
Error executeObjcopyOnBinary(const CopyConfig &Config,
object::COFFObjectFile &In, Buffer &Out);
Error executeObjcopyOnBinary(const CopyConfig &Config,
object::ELFObjectFileBase &In, Buffer &Out);
Error executeObjcopyOnBinary(const CopyConfig &Config,
object::MachOObjectFile &In, Buffer &Out);
Error executeObjcopyOnBinary(const CopyConfig &Config,
object::WasmObjectFile &In, Buffer &Out);
6. Address ranges and single addresses pointing to removed code should
be marked
with tombstone value in the input file:
-2 for .debug_ranges and .debug_loc.
-1 for other .debug* tables.
7. Prototype implementation - https://reviews.llvm.org/D86539.
======================================================================
Roadmap:
1. Refactor llvm-objcopy to extract it`s implementation into separate
library
ObjCopy(in LLVM tree).
2. Create a command line utility using existed DWARFLinker and ObjCopy
implementation. First version is supposed to work with only ELF
input object files.
It would take input ELF file with unoptimized debug info and create
output
ELF file with optimized debug info. That version would be done out
of the llvm tree.
3. Make a tool to be able to work in multi-thread mode.
4. Consider it to be included into LLVM tree.
5. Support DWARF5 tables.
======================================================================
Appendix A. Should this tool be implemented as a new tool or as an extension
to dsymutil/llvm-objcopy?
There already exists a tool which removes obsolete debug info on
darwin - dsymutil.
Why create another tool instead of extending the already existed
dsymutil/llvm-objcopy?
The main functionality of dsymutil is located in a separate library
- DWARFLinker.
Thus, dsymutil utility is a command-line interface for DWARFLinker.
dsymutil has
another type of input/output data: it takes several object files and
address map
as input and creates a .dSYM bundle with linked debug info as
output. llvm-dwarfutil
would take a built executable as input and create an optimized
executable as output.
Additionally, there would be many command-line options specific for
only one utility.
This means that these utilities(implementing command line interface)
would significantly
differ. It makes sense not to put another command-line utility
inside existing dsymutil,
but make it as a separate utility. That is the reason why
llvm-dwarfutil suggested to be
implemented not as sub-part of dsymutil but as a separate tool.
Please share your preference: whether llvm-dwarfutil should be
separate utility, or a variant of dsymutil compiled for ELF?
======================================================================
Appendix B. The machO object file format is already supported by dsymutil.
Depending on the decision whether llvm-dwarfutil would be done as a
subproject
of dsymutil or as a separate utility - machO would be supported or not.
======================================================================
Appendix C. Support for the COFF and WASM object file formats presented as
possible future improvement. It would be quite easy to add them
assuming
that llvm-objcopy already supports these formats. It also would require
supporting DWARF6-suggested tombstone values(-1/-2).
======================================================================
Appendix D. Documentation.
- proposal for DWARF6 which suggested -1/-2 values for marking bad
addresses
http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
- dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
- proposal "Remove obsolete debug info in lld."
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
======================================================================
Appendix E. Possible command line options:
DwarfUtil Options:
--build-aranges - generate .debug_aranges table.
--build-debug-names - generate .debug_names table.
--build-debug-pubnames - generate .debug_pubnames table.
--build-debug-pubtypes - generate .debug_pubtypes table.
--build-gdb-index - generate .gdb_index table.
--compress - Compress debug tables.
--decompress - Decompress debug tables.
--deduplicate-types - Do ODR deduplication for debug types.
--garbage-collect - Do garbage collecting for debug info.
--num-threads=<n> - Specify the maximum number (n) of
simultaneous threads
to use when optimizing input file.
Defaults to the number of cores on the
current machine.
--strip-all - Strip all debug tables.
--strip=<name1,name2> - Strip specified debug info tables.
--strip-unoptimized-debug - Strip all unoptimized debug tables.
--tombstone=<value> - Tombstone value used as a marker of
invalid address.
=bfd - BFD default value
=dwarf6 - Dwarf v6.
--verbose - Enable verbose logging and encoding details.
Generic Options:
--help - Display available options (--help-hidden
for more)
--version - Display the version of this program
More information about the llvm-dev
mailing list