[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

Thu Sep 3 04:07:17 PDT 2020

On 01.09.2020 06:24, David Blaikie wrote:
> On Fri, Aug 28, 2020 at 2:24 PM James Y Knight <jyknight at google.com 
> <mailto:jyknight at google.com>> wrote:
>
>     If we're designing a new tool and process, it would be wonderful
>     if it did not require multiple stages of copying and slightly
>     modifying the binary, in order to create final output with
>     separate debug info. It seems to me that the variants of this sort
>     of thing which exist today are somewhat suboptimal.
>
>     With Mach-O and dsymutil:
>       1. Given a collection of object files (which contain debuginfo),
>     link a binary with ld. The binary then includes special references
>     to the object files that were actually used as part of the link.
>       2. Given the linked binary, and all of the same object files,
>     link the debuginfo with dsymutil.
>       3. Strip the references to the object file paths from the binary.
>       Finally, you have a binary without debug info, and a dsym
>     debuginfo file. But it would be better if the binary created in
>     step 1 didn't need to include the extraneous object-file path
>     info, and that was instead emitted in a second file. Then we
>     wouldn't need step 3.
>
>     With "normal" ELF:
>       1. Given a collection of object files (which contain debuginfo),
>     link a binary with ld, which includes linking all the debug info
>     into the binary.
>       2. Given the linked binary, objcopy --only-keep-debug to create
>     a new separated debug file.
>       3. Given the linked binary, objcopy --strip-debug to create a
>     copy of the binary without debug info.
>       Finally you have a binary without debug info, and a separate
>     debug file. But it would be better if the linker could just write
>     the debug info into a separate file in the first place, then we'd
>     only have the one step. (But, downside, the linker needs to manage
>     all the debug info, which can be excessively large.)
>
>     With "split-dwarf" ELF support:
>       1. Given object files (which exclude /most/ but not all of the
>     debuginfo), link a binary. The binary will include that smaller
>     set of debug info.
>       2. Given the collection of dwo files corresponding to the object
>     files, run the "dwp" tool to create a dwp file.
>       3. objcopy --only-keep-debug
>       4. --strip-debug
>       And then you need to keep both a debug file /and/ a dwp file,
>     which is weird.
>
>
>     I think, ideally, users would have the following three /good/ options:
>       Easy option: store debuginfo in the object files, and have the
>     linker create a pair of {binary, separated dwarf-optimized
>     debuginfo} files directly from the object files.
>
>
> (as discussed by other replies - that was an early proposal, didn't 
> gain a lot of traction/Eric & Ray weren't super convinced it was worth 
> adding to lld at this stage, given the link time cost & thus the small 
> expected user base)
>
>       More scalable option: emit (most of the) debuginfo in separate
>     *.dwo files using -gsplit-dwarf, and then,
>         1. run the linker on the object files to create a pair of
>     {binary, separated debuginfo} files. In this case the latter file
>     contains the minimal debuginfo which was in the object files.
>
>
> Yeah, that ^ is probably a nice feature regardless. Save folks an 
> extra objcopy, etc. Usable right now for any build that is already 
> running only-keep-debug/strip-debug.
>
>         2. run a second tool, which reads the minimal debuginfo from
>     above, and all the DWO files, and creates a full
>     optimized/deduplicated debuginfo output file.
>
>
> Fair - this then looks a lot like the MachO debug info 
> distribution/linking model (with the advantage that the DWARF isn't in 
> the .o files, so doesn't have to be shipped to the machine doing the 
> linking), so far as I know.
>
>       Faster developer builds: Like previous, but omit step 2 --
>     running the debugger directly after step 1 can use the dwo files
>     on-disk.
>
>     I think we're not terribly far from that ideal, now, for ELF.
>     Maybe only these three things need to be done? --
>       1. Teach lld how to emit a separated debuginfo output file
>     directly, without requiring an objcopy step.
>       2. Integrate DWARFLinker into lld.
>       3. Create a new tool which takes the separated debuginfo and
>     DWO/DWP files and uses DWARFLinker library to create a new
>     (dwarf-linked) separated-debug file, that doesn't depend on
>     DWO/DWP files.
>
>     My hope is that the tool you're creating will be the
>     implementation of #3, but I'm afraid the intent is for this tool
>     to be an additional stage that non-split-dwarf users would need to
>     run post-link, /instead of/ integrating DWARFLinker into lld.
>
>
> Yeah, that's the direction lld folks have pushed for - a 
> post-processing, rather than link-time. Mostly due to the current 
> performance of DWARF-aware linking being quite slow, so the idea that 
> not many users would be willing to take that link-time performance hit 
> to use the feature. (whereas as a post-processing step before 
> archiving DWARF (like building a dwp from dwo files) it might be more 
> appealing/interesting - and maybe with sufficient performance 
> improvements, could then be rolled into lld as originally proposed)
>
> Curiously Alexey's needs include not wanting to use fission because a 
> single debuggable binary simplifies his users use-case/makes it easier 
> to distribute than two files. So he's probably not interested in the 
> strip-debug/only-keep-debug kind of debug info distribution model, at 
> least for his own users/use case. So far as I understand it.
>
> I've got mixed feelings about that - and encourage you to 
> express/clarify/discuss your thoughts here, as I think the whole 
> conversation could use some more voices.
Not that we do not interested in strip-debug/only-keep-debug kind of 
debug info distribution model.
But our customers also found the model, when optimized debug info is 
already put into the binary, useful.
It is a bit more convenient to pass a single binary to someone other to 
debug. Another thing is that it is a bit more convenient to manage/keep 
a single binary with debug info for daily builds to be able to quickly 
evaluate possible problems. Using a stripped debug info file assumes 
some process to work with it(how it is stored/how is distributed). Such 
a process makes sense when binaries shared with customers. But when 
debug builds are shared inside an organization it might be more 
convenient to share just a single file.

Thus, it would be convenient if tools would support both scenarios.
>
> - Dave
>
>
>     On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>         Hi,
>
>            We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>            Any thoughts on this?
>            Thanks in advance, Alexey.
>
>         ======================================================================
>
>         llvm-dwarfutil(Apndx A) - is a tool that is used for
>         processing debug
>         info(DWARF)
>         located in built binary files to improve debug info quality,
>         reduce debug info size and accelerate debug info processing.
>         Supported object files formats: ELF, MachO(Apndx B),
>         COFF(Apndx C),
>         WASM(Apndx C).
>
>         ======================================================================
>
>         Specifically, the tool would do:
>
>            - Remove obsolete debug info which refers to code deleted
>         by the linker
>              doing the garbage collection (gc-sections).
>
>            - Deduplicate debug type definitions for reducing resulting
>         size of
>         binary.
>
>            - Build accelerator/index tables.
>              = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>         .debug_pubtypes.
>
>            - Strip unneeded tables.
>              = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>         .debug_pubtypes.
>
>            - Compress or decompress debug info as requested.
>
>         Possible feature:
>
>            - Join split dwarf .dwo files in a single file containing
>         all debug info
>              (convert split DWARF into monolithic DWARF).
>
>         ======================================================================
>
>         User interface:
>
>            OVERVIEW: A tool for optimizing debug info located in the
>         built binary.
>
>            USAGE: llvm-dwarfutil [options] input output
>
>            OPTIONS: (Apndx E)
>
>         ======================================================================
>
>         Implementation notes:
>
>         1. Removing obsolete debug info would be done using
>         DWARFLinker llvm
>         library.
>
>         2. Data types deduplication would be done using DWARFLinker
>         llvm library.
>
>         3. Accelerator/index tables would be generated using
>         DWARFLinker llvm
>         library.
>
>         4. Interface of DWARFLinker library would be changed in such
>         way that it
>             would be possible to switch on/off various stages:
>
>            class DWARFLinker {
>              setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>
>              setDoAppleNames ( bool DoAppleNames = false );
>              setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>              setDoAppleTypes ( bool DoAppleTypes = false );
>              setDoObjC ( bool DoObjC = false );
>              setDoDebugPubNames ( bool DoDebugPubNames = false );
>              setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>
>              setDoDebugNames (bool DoDebugNames = false);
>              setDoGDBIndex (bool DoGDBIndex = false);
>            }
>
>         5. Copying source file contents, stripping tables,
>         compressing/decompressing tables
>             would be done by ObjCopy llvm library(extracted from
>         llvm-objcopy):
>
>            Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                       object::COFFObjectFile &In,
>         Buffer &Out);
>            Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                       object::ELFObjectFileBase &In,
>         Buffer &Out);
>            Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                       object::MachOObjectFile &In,
>         Buffer &Out);
>            Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                       object::WasmObjectFile &In,
>         Buffer &Out);
>
>         6. Address ranges and single addresses pointing to removed
>         code should
>         be marked
>             with tombstone value in the input file:
>
>             -2 for .debug_ranges and .debug_loc.
>             -1 for other .debug* tables.
>
>         7. Prototype implementation - https://reviews.llvm.org/D86539.
>
>         ======================================================================
>
>         Roadmap:
>
>         1. Refactor llvm-objcopy to extract it`s implementation into
>         separate
>         library
>             ObjCopy(in LLVM tree).
>
>         2. Create a command line utility using existed DWARFLinker and
>         ObjCopy
>             implementation. First version is supposed to work with
>         only ELF
>         input object files.
>             It would take input ELF file with unoptimized debug info
>         and create
>         output
>             ELF file with optimized debug info. That version would be
>         done out
>         of the llvm tree.
>
>         3. Make a tool to be able to work in multi-thread mode.
>
>         4. Consider it to be included into LLVM tree.
>
>         5. Support DWARF5 tables.
>
>         ======================================================================
>
>         Appendix A. Should this tool be implemented as a new tool or
>         as an extension
>                      to dsymutil/llvm-objcopy?
>
>             There already exists a tool which removes obsolete debug
>         info on
>         darwin - dsymutil.
>             Why create another tool instead of extending the already
>         existed
>         dsymutil/llvm-objcopy?
>
>             The main functionality of dsymutil is located in a
>         separate library
>         - DWARFLinker.
>             Thus, dsymutil utility is a command-line interface for
>         DWARFLinker.
>         dsymutil has
>             another type of input/output data: it takes several object
>         files and
>         address map
>             as input and creates a .dSYM bundle with linked debug info as
>         output. llvm-dwarfutil
>             would take a built executable as input and create an
>         optimized
>         executable as output.
>             Additionally, there would be many command-line options
>         specific for
>         only one utility.
>             This means that these utilities(implementing command line
>         interface)
>         would significantly
>             differ. It makes sense not to put another command-line
>         utility
>         inside existing dsymutil,
>             but make it as a separate utility. That is the reason why
>         llvm-dwarfutil suggested to be
>             implemented not as sub-part of dsymutil but as a separate
>         tool.
>
>             Please share your preference: whether llvm-dwarfutil should be
>             separate utility, or a variant of dsymutil compiled for ELF?
>
>         ======================================================================
>
>         Appendix B. The machO object file format is already supported
>         by dsymutil.
>             Depending on the decision whether llvm-dwarfutil would be
>         done as a
>         subproject
>             of dsymutil or as a separate utility - machO would be
>         supported or not.
>
>         ======================================================================
>
>         Appendix C. Support for the COFF and WASM object file formats
>         presented as
>              possible future improvement. It would be quite easy to
>         add them
>         assuming
>              that llvm-objcopy already supports these formats. It also
>         would require
>              supporting DWARF6-suggested tombstone values(-1/-2).
>
>         ======================================================================
>
>         Appendix D. Documentation.
>
>            - proposal for DWARF6 which suggested -1/-2 values for
>         marking bad
>         addresses
>         http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>            - dsymutil tool
>         https://llvm.org/docs/CommandGuide/dsymutil.html.
>            - proposal "Remove obsolete debug info in lld."
>         http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>
>         ======================================================================
>
>         Appendix E. Possible command line options:
>
>         DwarfUtil Options:
>
>            --build-aranges           - generate .debug_aranges table.
>            --build-debug-names       - generate .debug_names table.
>            --build-debug-pubnames    - generate .debug_pubnames table.
>            --build-debug-pubtypes    - generate .debug_pubtypes table.
>            --build-gdb-index         - generate .gdb_index table.
>            --compress                - Compress debug tables.
>            --decompress              - Decompress debug tables.
>            --deduplicate-types       - Do ODR deduplication for debug
>         types.
>            --garbage-collect         - Do garbage collecting for debug
>         info.
>            --num-threads=<n>         - Specify the maximum number (n) of
>         simultaneous threads
>                                        to use when optimizing input file.
>                                        Defaults to the number of cores
>         on the
>         current machine.
>            --strip-all               - Strip all debug tables.
>            --strip=<name1,name2>     - Strip specified debug info tables.
>            --strip-unoptimized-debug - Strip all unoptimized debug tables.
>            --tombstone=<value>       - Tombstone value used as a
>         marker of
>         invalid address.
>              =bfd                    -   BFD default value
>              =dwarf6                 -   Dwarf v6.
>            --verbose                 - Enable verbose logging and
>         encoding details.
>
>         Generic Options:
>
>            --help                    - Display available options
>         (--help-hidden
>         for more)
>            --version                 - Display the version of this program
>
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/e26cf5e2/attachment.html>