[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
James Henderson via llvm-dev
llvm-dev at lists.llvm.org
Thu Aug 27 00:53:25 PDT 2020
On Wed, 26 Aug 2020 at 15:01, Alexey <avl.lapshin at gmail.com> wrote:
>
> On 26.08.2020 10:58, James Henderson wrote:
>
> In principle, this sounds reasonable to me. I don't know enough about
> dsymutil's interface to know whether it makes sense to try to make it
> multi-format compatible or not. If it doesn't I'm perfectly happy for a new
> tool to be added using the DWARFLinker library.
>
> Some more general thoughts:
> 1) Assuming the proposal is accepted, this should be introduced piecemeal
> into LLVM from the beginning as it is developed, rather than having a
> separate step 4 in the roadmap.
> 2) The default tombstone values used for dead debug data should be those
> produced by LLD, in my opinion. In an ideal world, we'd factor them into
> some shared constant. Note that at the time of writing, I believe LLD is
> currently using BFD-style tombstones, not the new -1/-2.
>
> agreed.
>
> 3) Does the DWARFLinker library already support multi-threading? If not,
> it might be a lot of work making things thread-safe.
>
> It does, but in a limited way. It can parallelize analyzing and cloning
> stages. i.e. the maximal speedup is two times.
>
> To have a greater performance impact it could probably be parallelized per
> compilation unit basis.
>
> Another thing is that dsymutil currently loads all DIEs from source object
> file into the memory. And releases them after object file is processed. For
> non-linked binary this works OK(big binaries usually compiled from several
> object files). For linked binary that means all DIEs are loaded into the
> memory. In the result it requires a lot of memory resources. The solution
> for this problem could be changing splitting of source data from the file
> to the compilation unit basis.
>
> yes, making dsymutil/dwarfutil to work on compilation unit basis
> supporting multi-threading is a quite a big piece of work. It looks like it
> would be good for both dsymutil and dwarfutil.
>
> 4) Given that DWARF v6 doesn't exist yet, I wouldn't include that as an
> option name just yet...!
>
> Would "maxpc" be OK? --tombstone=maxpc ?
>
"maxpc" sounds reasonable for an initial stab at a name. I'm sure there's
something better out there, but I can't think of it, so no need to worry,
if you don't come up with anything better!
>
> Thanks for looking at this! Please keep me involved in any related reviews
> etc.
>
> sure. Thank you for the comments.
>
> Alexey.
>
>
>
> James
>
> On Tue, 25 Aug 2020 at 15:29, Alexey via llvm-dev <llvm-dev at lists.llvm.org>
> wrote:
>
>> Hi,
>>
>> We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>> Any thoughts on this?
>> Thanks in advance, Alexey.
>>
>> ======================================================================
>>
>> llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>> info(DWARF)
>> located in built binary files to improve debug info quality,
>> reduce debug info size and accelerate debug info processing.
>> Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>> WASM(Apndx C).
>>
>> ======================================================================
>>
>> Specifically, the tool would do:
>>
>> - Remove obsolete debug info which refers to code deleted by the linker
>> doing the garbage collection (gc-sections).
>>
>> - Deduplicate debug type definitions for reducing resulting size of
>> binary.
>>
>> - Build accelerator/index tables.
>> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>> .debug_pubtypes.
>>
>> - Strip unneeded tables.
>> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>> .debug_pubtypes.
>>
>> - Compress or decompress debug info as requested.
>>
>> Possible feature:
>>
>> - Join split dwarf .dwo files in a single file containing all debug
>> info
>> (convert split DWARF into monolithic DWARF).
>>
>> ======================================================================
>>
>> User interface:
>>
>> OVERVIEW: A tool for optimizing debug info located in the built binary.
>>
>> USAGE: llvm-dwarfutil [options] input output
>>
>> OPTIONS: (Apndx E)
>>
>> ======================================================================
>>
>> Implementation notes:
>>
>> 1. Removing obsolete debug info would be done using DWARFLinker llvm
>> library.
>>
>> 2. Data types deduplication would be done using DWARFLinker llvm library.
>>
>> 3. Accelerator/index tables would be generated using DWARFLinker llvm
>> library.
>>
>> 4. Interface of DWARFLinker library would be changed in such way that it
>> would be possible to switch on/off various stages:
>>
>> class DWARFLinker {
>> setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>>
>> setDoAppleNames ( bool DoAppleNames = false );
>> setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>> setDoAppleTypes ( bool DoAppleTypes = false );
>> setDoObjC ( bool DoObjC = false );
>> setDoDebugPubNames ( bool DoDebugPubNames = false );
>> setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>
>> setDoDebugNames (bool DoDebugNames = false);
>> setDoGDBIndex (bool DoGDBIndex = false);
>> }
>>
>> 5. Copying source file contents, stripping tables,
>> compressing/decompressing tables
>> would be done by ObjCopy llvm library(extracted from llvm-objcopy):
>>
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::COFFObjectFile &In, Buffer &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::ELFObjectFileBase &In, Buffer &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::MachOObjectFile &In, Buffer &Out);
>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>> object::WasmObjectFile &In, Buffer &Out);
>>
>> 6. Address ranges and single addresses pointing to removed code should
>> be marked
>> with tombstone value in the input file:
>>
>> -2 for .debug_ranges and .debug_loc.
>> -1 for other .debug* tables.
>>
>> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>>
>> ======================================================================
>>
>> Roadmap:
>>
>> 1. Refactor llvm-objcopy to extract it`s implementation into separate
>> library
>> ObjCopy(in LLVM tree).
>>
>> 2. Create a command line utility using existed DWARFLinker and ObjCopy
>> implementation. First version is supposed to work with only ELF
>> input object files.
>> It would take input ELF file with unoptimized debug info and create
>> output
>> ELF file with optimized debug info. That version would be done out
>> of the llvm tree.
>>
>> 3. Make a tool to be able to work in multi-thread mode.
>>
>> 4. Consider it to be included into LLVM tree.
>>
>> 5. Support DWARF5 tables.
>>
>> ======================================================================
>>
>> Appendix A. Should this tool be implemented as a new tool or as an
>> extension
>> to dsymutil/llvm-objcopy?
>>
>> There already exists a tool which removes obsolete debug info on
>> darwin - dsymutil.
>> Why create another tool instead of extending the already existed
>> dsymutil/llvm-objcopy?
>>
>> The main functionality of dsymutil is located in a separate library
>> - DWARFLinker.
>> Thus, dsymutil utility is a command-line interface for DWARFLinker.
>> dsymutil has
>> another type of input/output data: it takes several object files and
>> address map
>> as input and creates a .dSYM bundle with linked debug info as
>> output. llvm-dwarfutil
>> would take a built executable as input and create an optimized
>> executable as output.
>> Additionally, there would be many command-line options specific for
>> only one utility.
>> This means that these utilities(implementing command line interface)
>> would significantly
>> differ. It makes sense not to put another command-line utility
>> inside existing dsymutil,
>> but make it as a separate utility. That is the reason why
>> llvm-dwarfutil suggested to be
>> implemented not as sub-part of dsymutil but as a separate tool.
>>
>> Please share your preference: whether llvm-dwarfutil should be
>> separate utility, or a variant of dsymutil compiled for ELF?
>>
>> ======================================================================
>>
>> Appendix B. The machO object file format is already supported by dsymutil.
>> Depending on the decision whether llvm-dwarfutil would be done as a
>> subproject
>> of dsymutil or as a separate utility - machO would be supported or
>> not.
>>
>> ======================================================================
>>
>> Appendix C. Support for the COFF and WASM object file formats presented as
>> possible future improvement. It would be quite easy to add them
>> assuming
>> that llvm-objcopy already supports these formats. It also would
>> require
>> supporting DWARF6-suggested tombstone values(-1/-2).
>>
>> ======================================================================
>>
>> Appendix D. Documentation.
>>
>> - proposal for DWARF6 which suggested -1/-2 values for marking bad
>> addresses
>> http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>> - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>> - proposal "Remove obsolete debug info in lld."
>> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>
>> ======================================================================
>>
>> Appendix E. Possible command line options:
>>
>> DwarfUtil Options:
>>
>> --build-aranges - generate .debug_aranges table.
>> --build-debug-names - generate .debug_names table.
>> --build-debug-pubnames - generate .debug_pubnames table.
>> --build-debug-pubtypes - generate .debug_pubtypes table.
>> --build-gdb-index - generate .gdb_index table.
>> --compress - Compress debug tables.
>> --decompress - Decompress debug tables.
>> --deduplicate-types - Do ODR deduplication for debug types.
>> --garbage-collect - Do garbage collecting for debug info.
>> --num-threads=<n> - Specify the maximum number (n) of
>> simultaneous threads
>> to use when optimizing input file.
>> Defaults to the number of cores on the
>> current machine.
>> --strip-all - Strip all debug tables.
>> --strip=<name1,name2> - Strip specified debug info tables.
>> --strip-unoptimized-debug - Strip all unoptimized debug tables.
>> --tombstone=<value> - Tombstone value used as a marker of
>> invalid address.
>> =bfd - BFD default value
>> =dwarf6 - Dwarf v6.
>> --verbose - Enable verbose logging and encoding
>> details.
>>
>> Generic Options:
>>
>> --help - Display available options (--help-hidden
>> for more)
>> --version - Display the version of this program
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200827/44bc513c/attachment-0001.html>
More information about the llvm-dev
mailing list