[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Alexey via llvm-dev
llvm-dev at lists.llvm.org
Thu Sep 3 04:07:17 PDT 2020
On 01.09.2020 06:24, David Blaikie wrote:
> On Fri, Aug 28, 2020 at 2:24 PM James Y Knight <jyknight at google.com
> <mailto:jyknight at google.com>> wrote:
>
> If we're designing a new tool and process, it would be wonderful
> if it did not require multiple stages of copying and slightly
> modifying the binary, in order to create final output with
> separate debug info. It seems to me that the variants of this sort
> of thing which exist today are somewhat suboptimal.
>
> With Mach-O and dsymutil:
> 1. Given a collection of object files (which contain debuginfo),
> link a binary with ld. The binary then includes special references
> to the object files that were actually used as part of the link.
> 2. Given the linked binary, and all of the same object files,
> link the debuginfo with dsymutil.
> 3. Strip the references to the object file paths from the binary.
> Finally, you have a binary without debug info, and a dsym
> debuginfo file. But it would be better if the binary created in
> step 1 didn't need to include the extraneous object-file path
> info, and that was instead emitted in a second file. Then we
> wouldn't need step 3.
>
> With "normal" ELF:
> 1. Given a collection of object files (which contain debuginfo),
> link a binary with ld, which includes linking all the debug info
> into the binary.
> 2. Given the linked binary, objcopy --only-keep-debug to create
> a new separated debug file.
> 3. Given the linked binary, objcopy --strip-debug to create a
> copy of the binary without debug info.
> Finally you have a binary without debug info, and a separate
> debug file. But it would be better if the linker could just write
> the debug info into a separate file in the first place, then we'd
> only have the one step. (But, downside, the linker needs to manage
> all the debug info, which can be excessively large.)
>
> With "split-dwarf" ELF support:
> 1. Given object files (which exclude /most/ but not all of the
> debuginfo), link a binary. The binary will include that smaller
> set of debug info.
> 2. Given the collection of dwo files corresponding to the object
> files, run the "dwp" tool to create a dwp file.
> 3. objcopy --only-keep-debug
> 4. --strip-debug
> And then you need to keep both a debug file /and/ a dwp file,
> which is weird.
>
>
> I think, ideally, users would have the following three /good/ options:
> Easy option: store debuginfo in the object files, and have the
> linker create a pair of {binary, separated dwarf-optimized
> debuginfo} files directly from the object files.
>
>
> (as discussed by other replies - that was an early proposal, didn't
> gain a lot of traction/Eric & Ray weren't super convinced it was worth
> adding to lld at this stage, given the link time cost & thus the small
> expected user base)
>
> More scalable option: emit (most of the) debuginfo in separate
> *.dwo files using -gsplit-dwarf, and then,
> 1. run the linker on the object files to create a pair of
> {binary, separated debuginfo} files. In this case the latter file
> contains the minimal debuginfo which was in the object files.
>
>
> Yeah, that ^ is probably a nice feature regardless. Save folks an
> extra objcopy, etc. Usable right now for any build that is already
> running only-keep-debug/strip-debug.
>
> 2. run a second tool, which reads the minimal debuginfo from
> above, and all the DWO files, and creates a full
> optimized/deduplicated debuginfo output file.
>
>
> Fair - this then looks a lot like the MachO debug info
> distribution/linking model (with the advantage that the DWARF isn't in
> the .o files, so doesn't have to be shipped to the machine doing the
> linking), so far as I know.
>
> Faster developer builds: Like previous, but omit step 2 --
> running the debugger directly after step 1 can use the dwo files
> on-disk.
>
> I think we're not terribly far from that ideal, now, for ELF.
> Maybe only these three things need to be done? --
> 1. Teach lld how to emit a separated debuginfo output file
> directly, without requiring an objcopy step.
> 2. Integrate DWARFLinker into lld.
> 3. Create a new tool which takes the separated debuginfo and
> DWO/DWP files and uses DWARFLinker library to create a new
> (dwarf-linked) separated-debug file, that doesn't depend on
> DWO/DWP files.
>
> My hope is that the tool you're creating will be the
> implementation of #3, but I'm afraid the intent is for this tool
> to be an additional stage that non-split-dwarf users would need to
> run post-link, /instead of/ integrating DWARFLinker into lld.
>
>
> Yeah, that's the direction lld folks have pushed for - a
> post-processing, rather than link-time. Mostly due to the current
> performance of DWARF-aware linking being quite slow, so the idea that
> not many users would be willing to take that link-time performance hit
> to use the feature. (whereas as a post-processing step before
> archiving DWARF (like building a dwp from dwo files) it might be more
> appealing/interesting - and maybe with sufficient performance
> improvements, could then be rolled into lld as originally proposed)
>
> Curiously Alexey's needs include not wanting to use fission because a
> single debuggable binary simplifies his users use-case/makes it easier
> to distribute than two files. So he's probably not interested in the
> strip-debug/only-keep-debug kind of debug info distribution model, at
> least for his own users/use case. So far as I understand it.
>
> I've got mixed feelings about that - and encourage you to
> express/clarify/discuss your thoughts here, as I think the whole
> conversation could use some more voices.
Not that we do not interested in strip-debug/only-keep-debug kind of
debug info distribution model.
But our customers also found the model, when optimized debug info is
already put into the binary, useful.
It is a bit more convenient to pass a single binary to someone other to
debug. Another thing is that it is a bit more convenient to manage/keep
a single binary with debug info for daily builds to be able to quickly
evaluate possible problems. Using a stripped debug info file assumes
some process to work with it(how it is stored/how is distributed). Such
a process makes sense when binaries shared with customers. But when
debug builds are shared inside an organization it might be more
convenient to share just a single file.
Thus, it would be convenient if tools would support both scenarios.
>
> - Dave
>
>
> On Tue, Aug 25, 2020 at 10:29 AM Alexey via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Hi,
>
> We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
> Any thoughts on this?
> Thanks in advance, Alexey.
>
> ======================================================================
>
> llvm-dwarfutil(Apndx A) - is a tool that is used for
> processing debug
> info(DWARF)
> located in built binary files to improve debug info quality,
> reduce debug info size and accelerate debug info processing.
> Supported object files formats: ELF, MachO(Apndx B),
> COFF(Apndx C),
> WASM(Apndx C).
>
> ======================================================================
>
> Specifically, the tool would do:
>
> - Remove obsolete debug info which refers to code deleted
> by the linker
> doing the garbage collection (gc-sections).
>
> - Deduplicate debug type definitions for reducing resulting
> size of
> binary.
>
> - Build accelerator/index tables.
> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
> .debug_pubtypes.
>
> - Strip unneeded tables.
> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
> .debug_pubtypes.
>
> - Compress or decompress debug info as requested.
>
> Possible feature:
>
> - Join split dwarf .dwo files in a single file containing
> all debug info
> (convert split DWARF into monolithic DWARF).
>
> ======================================================================
>
> User interface:
>
> OVERVIEW: A tool for optimizing debug info located in the
> built binary.
>
> USAGE: llvm-dwarfutil [options] input output
>
> OPTIONS: (Apndx E)
>
> ======================================================================
>
> Implementation notes:
>
> 1. Removing obsolete debug info would be done using
> DWARFLinker llvm
> library.
>
> 2. Data types deduplication would be done using DWARFLinker
> llvm library.
>
> 3. Accelerator/index tables would be generated using
> DWARFLinker llvm
> library.
>
> 4. Interface of DWARFLinker library would be changed in such
> way that it
> would be possible to switch on/off various stages:
>
> class DWARFLinker {
> setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>
> setDoAppleNames ( bool DoAppleNames = false );
> setDoAppleNamespaces ( bool DoAppleNamespaces = false );
> setDoAppleTypes ( bool DoAppleTypes = false );
> setDoObjC ( bool DoObjC = false );
> setDoDebugPubNames ( bool DoDebugPubNames = false );
> setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>
> setDoDebugNames (bool DoDebugNames = false);
> setDoGDBIndex (bool DoGDBIndex = false);
> }
>
> 5. Copying source file contents, stripping tables,
> compressing/decompressing tables
> would be done by ObjCopy llvm library(extracted from
> llvm-objcopy):
>
> Error executeObjcopyOnBinary(const CopyConfig &Config,
> object::COFFObjectFile &In,
> Buffer &Out);
> Error executeObjcopyOnBinary(const CopyConfig &Config,
> object::ELFObjectFileBase &In,
> Buffer &Out);
> Error executeObjcopyOnBinary(const CopyConfig &Config,
> object::MachOObjectFile &In,
> Buffer &Out);
> Error executeObjcopyOnBinary(const CopyConfig &Config,
> object::WasmObjectFile &In,
> Buffer &Out);
>
> 6. Address ranges and single addresses pointing to removed
> code should
> be marked
> with tombstone value in the input file:
>
> -2 for .debug_ranges and .debug_loc.
> -1 for other .debug* tables.
>
> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>
> ======================================================================
>
> Roadmap:
>
> 1. Refactor llvm-objcopy to extract it`s implementation into
> separate
> library
> ObjCopy(in LLVM tree).
>
> 2. Create a command line utility using existed DWARFLinker and
> ObjCopy
> implementation. First version is supposed to work with
> only ELF
> input object files.
> It would take input ELF file with unoptimized debug info
> and create
> output
> ELF file with optimized debug info. That version would be
> done out
> of the llvm tree.
>
> 3. Make a tool to be able to work in multi-thread mode.
>
> 4. Consider it to be included into LLVM tree.
>
> 5. Support DWARF5 tables.
>
> ======================================================================
>
> Appendix A. Should this tool be implemented as a new tool or
> as an extension
> to dsymutil/llvm-objcopy?
>
> There already exists a tool which removes obsolete debug
> info on
> darwin - dsymutil.
> Why create another tool instead of extending the already
> existed
> dsymutil/llvm-objcopy?
>
> The main functionality of dsymutil is located in a
> separate library
> - DWARFLinker.
> Thus, dsymutil utility is a command-line interface for
> DWARFLinker.
> dsymutil has
> another type of input/output data: it takes several object
> files and
> address map
> as input and creates a .dSYM bundle with linked debug info as
> output. llvm-dwarfutil
> would take a built executable as input and create an
> optimized
> executable as output.
> Additionally, there would be many command-line options
> specific for
> only one utility.
> This means that these utilities(implementing command line
> interface)
> would significantly
> differ. It makes sense not to put another command-line
> utility
> inside existing dsymutil,
> but make it as a separate utility. That is the reason why
> llvm-dwarfutil suggested to be
> implemented not as sub-part of dsymutil but as a separate
> tool.
>
> Please share your preference: whether llvm-dwarfutil should be
> separate utility, or a variant of dsymutil compiled for ELF?
>
> ======================================================================
>
> Appendix B. The machO object file format is already supported
> by dsymutil.
> Depending on the decision whether llvm-dwarfutil would be
> done as a
> subproject
> of dsymutil or as a separate utility - machO would be
> supported or not.
>
> ======================================================================
>
> Appendix C. Support for the COFF and WASM object file formats
> presented as
> possible future improvement. It would be quite easy to
> add them
> assuming
> that llvm-objcopy already supports these formats. It also
> would require
> supporting DWARF6-suggested tombstone values(-1/-2).
>
> ======================================================================
>
> Appendix D. Documentation.
>
> - proposal for DWARF6 which suggested -1/-2 values for
> marking bad
> addresses
> http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
> - dsymutil tool
> https://llvm.org/docs/CommandGuide/dsymutil.html.
> - proposal "Remove obsolete debug info in lld."
> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>
> ======================================================================
>
> Appendix E. Possible command line options:
>
> DwarfUtil Options:
>
> --build-aranges - generate .debug_aranges table.
> --build-debug-names - generate .debug_names table.
> --build-debug-pubnames - generate .debug_pubnames table.
> --build-debug-pubtypes - generate .debug_pubtypes table.
> --build-gdb-index - generate .gdb_index table.
> --compress - Compress debug tables.
> --decompress - Decompress debug tables.
> --deduplicate-types - Do ODR deduplication for debug
> types.
> --garbage-collect - Do garbage collecting for debug
> info.
> --num-threads=<n> - Specify the maximum number (n) of
> simultaneous threads
> to use when optimizing input file.
> Defaults to the number of cores
> on the
> current machine.
> --strip-all - Strip all debug tables.
> --strip=<name1,name2> - Strip specified debug info tables.
> --strip-unoptimized-debug - Strip all unoptimized debug tables.
> --tombstone=<value> - Tombstone value used as a
> marker of
> invalid address.
> =bfd - BFD default value
> =dwarf6 - Dwarf v6.
> --verbose - Enable verbose logging and
> encoding details.
>
> Generic Options:
>
> --help - Display available options
> (--help-hidden
> for more)
> --version - Display the version of this program
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/e26cf5e2/attachment.html>
More information about the llvm-dev
mailing list