[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

Fangrui Song via llvm-dev llvm-dev at lists.llvm.org
Mon Aug 31 14:54:18 PDT 2020


On 2020-08-31, Alexey via llvm-dev wrote:
>Hi James,
>
>Thank you for the comments.
>
>>I think we're not terribly far from that ideal, now, for ELF. Maybe 
>only these three things need to be done? --
>>  1. Teach lld how to emit a separated debuginfo output file 
>directly, without requiring an objcopy step.
>>  2. Integrate DWARFLinker into lld.
>>  3. Create a new tool which takes the separated debuginfo and 
>DWO/DWP files and uses DWARFLinker library
>> to create a new (dwarf-linked) separated-debug file, that doesn't 
>depend on DWO/DWP files.
>
>The three goals which you`ve described are our far goals.
>Indeed, the best solution would be to create valid optimized debug 
>info without additional
>stages and additional modifications of resulting binaries.
>
>There was an attempt to use DWARFLinker from the lld - 
>https://reviews.llvm.org/D74169
>It did not receive enough support to be integrated yet. There are fair 
>reasons for that:
>
>1. Execution time. The time required by DWARFLinker for processing 
>clang binary is 8x bigger
>than the usual linking time. Linking clang binary with DWARFLinker 
>takes 72 sec,
>linking with the only lld takes 9 sec.
>
>2. "Removing obsolete debug info" could not be switched off. Thus, lld 
>could not use DWARFLinker for
>other tasks(like generation of index tables - .gdb_index, 
>.debug_names) without significant performance
>degradation.
>
>3. DWARFLinker does not support split dwarf at the moment.
>
>All these reasons are not blockers. And I believe implementation from 
>D74169 might be integrated and
>incrementally improved if there would be agreement on that.
>
>Using DWARFLinker from llvm-dwarfutil is another possibility to use 
>and improve it.
>When finally implemented - llvm-dwarfutil should solve the above three 
>issues and there
>would probably be more reasons to include DWARFLinker into lld.
>
>Even if we would have the best solution - it is still useful to have a 
>tool like llvm-dwarfutil
>for cases when it is necessary to process already created binaries.
>
>So in short, the suggested tool - llvm-dwarfutil - is a step towards 
>the ideal solution.
>Its benefit is that it could be used until we created the best 
>solution or for cases
>where "the best solution" is not applicable.
>
>Thank you, Alexey.
>
>
>On 29.08.2020 00:23, James Y Knight wrote:
>>If we're designing a new tool and process, it would be wonderful if 
>>it did not require multiple stages of copying and slightly modifying 
>>the binary, in order to create final output with separate 
>>debug info. It seems to me that the variants of this sort of thing 
>>which exist today are somewhat suboptimal.
>>
>>With Mach-O and dsymutil:
>>  1. Given a collection of object files (which contain debuginfo), 
>>link a binary with ld. The binary then includes special references 
>>to the object files that were actually used as part of the link.
>>  2. Given the linked binary, and all of the same object files, link 
>>the debuginfo with dsymutil.
>>  3. Strip the references to the object file paths from the binary.
>>  Finally, you have a binary without debug info, and a dsym 
>>debuginfo file. But it would be better if the binary created in step 
>>1 didn't need to include the extraneous object-file path info, and 
>>that was instead emitted in a second file. Then we wouldn't need 
>>step 3.
>>
>>With "normal" ELF:
>>  1. Given a collection of object files (which contain debuginfo), 
>>link a binary with ld, which includes linking all the debug info 
>>into the binary.
>>  2. Given the linked binary, objcopy --only-keep-debug to create a 
>>new separated debug file.
>>  3. Given the linked binary, objcopy --strip-debug to create a copy 
>>of the binary without debug info.
>>  Finally you have a binary without debug info, and a separate debug 
>>file. But it would be better if the linker could just write the 
>>debug info into a separate file in the first place, then we'd only 
>>have the one step. (But, downside, the linker needs to manage all 
>>the debug info, which can be excessively large.)
>>
>>With "split-dwarf" ELF support:
>>  1. Given object files (which exclude /most/ but not all of the 
>>debuginfo), link a binary. The binary will include that smaller set 
>>of debug info.
>>  2. Given the collection of dwo files corresponding to the object 
>>files, run the "dwp" tool to create a dwp file.
>>  3. objcopy --only-keep-debug
>>  4. --strip-debug
>>  And then you need to keep both a debug file /and/ a dwp file, 
>>which is weird.
>>
>>
>>I think, ideally, users would have the following three /good/ options:
>>  Easy option: store debuginfo in the object files, and have the 
>>linker create a pair of {binary, separated dwarf-optimized 
>>debuginfo} files directly from the object files.
>>  More scalable option: emit (most of the) debuginfo in separate 
>>*.dwo files using -gsplit-dwarf, and then,
>>    1. run the linker on the object files to create a pair of 
>>{binary, separated debuginfo} files. In this case the latter file 
>>contains the minimal debuginfo which was in the object files.
>>    2. run a second tool, which reads the minimal debuginfo from 
>>above, and all the DWO files, and creates a full 
>>optimized/deduplicated debuginfo output file.
>>  Faster developer builds: Like previous, but omit step 2 -- running 
>>the debugger directly after step 1 can use the dwo files on-disk.
>>
>>I think we're not terribly far from that ideal, now, for ELF. Maybe 
>>only these three things need to be done? --
>>  1. Teach lld how to emit a separated debuginfo output file 
>>directly, without requiring an objcopy step.

This is very similar to Solaris's ancillary objects (ET_SUNW_ANCILLARY).
There are more details on http://www.linker-aliens.org/blogs/ali/entry/ancillary_objects_separate_debug_elf/
In short, Solari's `ld -z ancillary[=outfile]` writes non-SHF_ALLOC sections to the
ancillary object. Perhaps we will need some coordination with GNU. Some
GNU folks are interested in a new object file type:
https://groups.google.com/forum/#!topic/generic-abi/tJq7anc6WKs


A debug file created by {,llvm-}objcopy --only-keep-debug has different
contents (see https://reviews.llvm.org/D67137 for details):
non-SHF_ALLOC sections and SHT_NOTE sections.  http://www.linker-aliens.org/blogs/ali/entry/ancillary_objects_separate_debug_elf/
does not say whether program headers are retained in the debug file, but
{,llvm-}objcopy --only-keep-debug keeps one copy (neither gdb/lldb needs
the program headers).

>>  2. Integrate DWARFLinker into lld.
>>  3. Create a new tool which takes the separated debuginfo and 
>>DWO/DWP files and uses DWARFLinker library to create a new 
>>(dwarf-linked) separated-debug file, that doesn't depend on DWO/DWP 
>>files.
>>
>>My hope is that the tool you're creating will be the implementation 
>>of #3, but I'm afraid the intent is for this tool to be an 
>>additional stage that non-split-dwarf users would need to run 
>>post-link, /instead of/ integrating DWARFLinker into lld.

>>On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev 
>><llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>    Hi,
>>
>>       We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>>       Any thoughts on this?
>>       Thanks in advance, Alexey.
>>
>>    ======================================================================
>>
>>    llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>>    info(DWARF)
>>    located in built binary files to improve debug info quality,
>>    reduce debug info size and accelerate debug info processing.
>>    Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>>    WASM(Apndx C).
>>
>>    ======================================================================
>>
>>    Specifically, the tool would do:
>>
>>       - Remove obsolete debug info which refers to code deleted by
>>    the linker
>>         doing the garbage collection (gc-sections).
>>
>>       - Deduplicate debug type definitions for reducing resulting
>>    size of
>>    binary.
>>
>>       - Build accelerator/index tables.
>>         = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>>    .debug_pubtypes.
>>
>>       - Strip unneeded tables.
>>         = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>>    .debug_pubtypes.
>>
>>       - Compress or decompress debug info as requested.
>>
>>    Possible feature:
>>
>>       - Join split dwarf .dwo files in a single file containing all
>>    debug info
>>         (convert split DWARF into monolithic DWARF).
>>
>>    ======================================================================
>>
>>    User interface:
>>
>>       OVERVIEW: A tool for optimizing debug info located in the built
>>    binary.
>>
>>       USAGE: llvm-dwarfutil [options] input output
>>
>>       OPTIONS: (Apndx E)
>>
>>    ======================================================================
>>
>>    Implementation notes:
>>
>>    1. Removing obsolete debug info would be done using DWARFLinker llvm
>>    library.
>>
>>    2. Data types deduplication would be done using DWARFLinker llvm
>>    library.
>>
>>    3. Accelerator/index tables would be generated using DWARFLinker llvm
>>    library.
>>
>>    4. Interface of DWARFLinker library would be changed in such way
>>    that it
>>        would be possible to switch on/off various stages:
>>
>>       class DWARFLinker {
>>         setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>>
>>         setDoAppleNames ( bool DoAppleNames = false );
>>         setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>>         setDoAppleTypes ( bool DoAppleTypes = false );
>>         setDoObjC ( bool DoObjC = false );
>>         setDoDebugPubNames ( bool DoDebugPubNames = false );
>>         setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>
>>         setDoDebugNames (bool DoDebugNames = false);
>>         setDoGDBIndex (bool DoGDBIndex = false);
>>       }
>>
>>    5. Copying source file contents, stripping tables,
>>    compressing/decompressing tables
>>        would be done by ObjCopy llvm library(extracted from
>>    llvm-objcopy):
>>
>>       Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                  object::COFFObjectFile &In, Buffer
>>    &Out);
>>       Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                  object::ELFObjectFileBase &In,
>>    Buffer &Out);
>>       Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                  object::MachOObjectFile &In, Buffer
>>    &Out);
>>       Error executeObjcopyOnBinary(const CopyConfig &Config,
>>                                  object::WasmObjectFile &In, Buffer
>>    &Out);
>>
>>    6. Address ranges and single addresses pointing to removed code
>>    should
>>    be marked
>>        with tombstone value in the input file:
>>
>>        -2 for .debug_ranges and .debug_loc.
>>        -1 for other .debug* tables.
>>
>>    7. Prototype implementation - https://reviews.llvm.org/D86539.
>>
>>    ======================================================================
>>
>>    Roadmap:
>>
>>    1. Refactor llvm-objcopy to extract it`s implementation into separate
>>    library
>>        ObjCopy(in LLVM tree).
>>
>>    2. Create a command line utility using existed DWARFLinker and ObjCopy
>>        implementation. First version is supposed to work with only ELF
>>    input object files.
>>        It would take input ELF file with unoptimized debug info and
>>    create
>>    output
>>        ELF file with optimized debug info. That version would be done
>>    out
>>    of the llvm tree.
>>
>>    3. Make a tool to be able to work in multi-thread mode.
>>
>>    4. Consider it to be included into LLVM tree.
>>
>>    5. Support DWARF5 tables.
>>
>>    ======================================================================
>>
>>    Appendix A. Should this tool be implemented as a new tool or as an
>>    extension
>>                 to dsymutil/llvm-objcopy?
>>
>>        There already exists a tool which removes obsolete debug info on
>>    darwin - dsymutil.
>>        Why create another tool instead of extending the already existed
>>    dsymutil/llvm-objcopy?
>>
>>        The main functionality of dsymutil is located in a separate
>>    library
>>    - DWARFLinker.
>>        Thus, dsymutil utility is a command-line interface for
>>    DWARFLinker.
>>    dsymutil has
>>        another type of input/output data: it takes several object
>>    files and
>>    address map
>>        as input and creates a .dSYM bundle with linked debug info as
>>    output. llvm-dwarfutil
>>        would take a built executable as input and create an optimized
>>    executable as output.
>>        Additionally, there would be many command-line options
>>    specific for
>>    only one utility.
>>        This means that these utilities(implementing command line
>>    interface)
>>    would significantly
>>        differ. It makes sense not to put another command-line utility
>>    inside existing dsymutil,
>>        but make it as a separate utility. That is the reason why
>>    llvm-dwarfutil suggested to be
>>        implemented not as sub-part of dsymutil but as a separate tool.
>>
>>        Please share your preference: whether llvm-dwarfutil should be
>>        separate utility, or a variant of dsymutil compiled for ELF?
>>
>>    ======================================================================
>>
>>    Appendix B. The machO object file format is already supported by
>>    dsymutil.
>>        Depending on the decision whether llvm-dwarfutil would be done
>>    as a
>>    subproject
>>        of dsymutil or as a separate utility - machO would be
>>    supported or not.
>>
>>    ======================================================================
>>
>>    Appendix C. Support for the COFF and WASM object file formats
>>    presented as
>>         possible future improvement. It would be quite easy to add them
>>    assuming
>>         that llvm-objcopy already supports these formats. It also
>>    would require
>>         supporting DWARF6-suggested tombstone values(-1/-2).
>>
>>    ======================================================================
>>
>>    Appendix D. Documentation.
>>
>>       - proposal for DWARF6 which suggested -1/-2 values for marking bad
>>    addresses
>>    http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>       - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>>       - proposal "Remove obsolete debug info in lld."
>>    http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>
>>    ======================================================================
>>
>>    Appendix E. Possible command line options:
>>
>>    DwarfUtil Options:
>>
>>       --build-aranges           - generate .debug_aranges table.
>>       --build-debug-names       - generate .debug_names table.
>>       --build-debug-pubnames    - generate .debug_pubnames table.
>>       --build-debug-pubtypes    - generate .debug_pubtypes table.
>>       --build-gdb-index         - generate .gdb_index table.
>>       --compress                - Compress debug tables.
>>       --decompress              - Decompress debug tables.
>>       --deduplicate-types       - Do ODR deduplication for debug types.
>>       --garbage-collect         - Do garbage collecting for debug info.
>>       --num-threads=<n>         - Specify the maximum number (n) of
>>    simultaneous threads
>>                                   to use when optimizing input file.
>>                                   Defaults to the number of cores on the
>>    current machine.
>>       --strip-all               - Strip all debug tables.
>>       --strip=<name1,name2>     - Strip specified debug info tables.
>>       --strip-unoptimized-debug - Strip all unoptimized debug tables.
>>       --tombstone=<value>       - Tombstone value used as a marker of
>>    invalid address.
>>         =bfd                    -   BFD default value
>>         =dwarf6                 -   Dwarf v6.
>>       --verbose                 - Enable verbose logging and encoding
>>    details.
>>
>>    Generic Options:
>>
>>       --help                    - Display available options
>>    (--help-hidden
>>    for more)
>>       --version                 - Display the version of this program
>>
>>    _______________________________________________
>>    LLVM Developers mailing list
>>    llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>    https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>

>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list