[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

Alexey via llvm-dev llvm-dev at lists.llvm.org
Thu Aug 27 13:48:15 PDT 2020


Hi Jonas, please find my comments below...

On 27.08.2020 02:05, Jonas Devlieghere wrote:
> Hey Alexey,
>
> I haven't had time to look at the corresponding patch yet, but I hope 
> to do that soon. Here are my initial thoughts on the proposal.
>
> On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>     Hi,
>
>        We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>        Any thoughts on this?
>        Thanks in advance, Alexey.
>
>     ======================================================================
>
>     llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>     info(DWARF)
>     located in built binary files to improve debug info quality,
>     reduce debug info size and accelerate debug info processing.
>     Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>     WASM(Apndx C).
>
>     ======================================================================
>
>     Specifically, the tool would do:
>
>        - Remove obsolete debug info which refers to code deleted by
>     the linker
>          doing the garbage collection (gc-sections).
>
>        - Deduplicate debug type definitions for reducing resulting
>     size of
>     binary.
>
>        - Build accelerator/index tables.
>          = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>     .debug_pubtypes.
>
>        - Strip unneeded tables.
>          = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>     .debug_pubtypes.
>
>        - Compress or decompress debug info as requested.
>
>     Possible feature:
>
>        - Join split dwarf .dwo files in a single file containing all
>     debug info
>          (convert split DWARF into monolithic DWARF).
>
>     ======================================================================
>
>     User interface:
>
>        OVERVIEW: A tool for optimizing debug info located in the built
>     binary.
>
>        USAGE: llvm-dwarfutil [options] input output
>
>
> Nit: I would make the output a separate flag with `-o` for consistency 
> with other similar tools.

Ok.


>
>        OPTIONS: (Apndx E)
>
>     ======================================================================
>
>     Implementation notes:
>
>     1. Removing obsolete debug info would be done using DWARFLinker llvm
>     library.
>
>     2. Data types deduplication would be done using DWARFLinker llvm
>     library.
>
>     3. Accelerator/index tables would be generated using DWARFLinker llvm
>     library.
>
>
> This sounds reasonable to me. I think there is value in having all 
> this in LLVM because LLD wants to use a subset of this functionality. 
> If it weren't for that I'd probably prefer to have this isolated to 
> just the tool.
>
>
>     4. Interface of DWARFLinker library would be changed in such way
>     that it
>         would be possible to switch on/off various stages:
>
>        class DWARFLinker {
>          setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>
>          setDoAppleNames ( bool DoAppleNames = false );
>          setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>          setDoAppleTypes ( bool DoAppleTypes = false );
>          setDoObjC ( bool DoObjC = false );
>          setDoDebugPubNames ( bool DoDebugPubNames = false );
>          setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>
>          setDoDebugNames (bool DoDebugNames = false);
>          setDoGDBIndex (bool DoGDBIndex = false);
>        }
>
>
> We can discuss this in the patch, but in dsymutil we pass LinkOption 
> to the linker. I think that would work great for enabling certain 
> functionality.
Ok, Let`s discuss this in the patch.
>
>
>     5. Copying source file contents, stripping tables,
>     compressing/decompressing tables
>         would be done by ObjCopy llvm library(extracted from
>     llvm-objcopy):
>
>        Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                   object::COFFObjectFile &In, Buffer
>     &Out);
>        Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                   object::ELFObjectFileBase &In,
>     Buffer &Out);
>        Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                   object::MachOObjectFile &In, Buffer
>     &Out);
>        Error executeObjcopyOnBinary(const CopyConfig &Config,
>                                   object::WasmObjectFile &In, Buffer
>     &Out);
>
>
> Just to make sure I understand this correctly. The current method 
> names suggest that you'd be running objcopy as an external tool, but 
> when implemented as a library you'd call the code in-process, right?

Not exactly. I suggest to move them into the library first and then call 
from dwarfutil code:

The example of such call is in the prototype :

tools/llvm-dwarfutil/llvm-dwarfutil.cpp:

template <class ELFT>
Error writeOutputFile(const Options &Options, ELFObjectFile<ELFT> 
&InputFile,
                       DataBits &OutBits) {
   ........
   objectcopy::FileBuffer FB(Config.OutputFilename);
   return objectcopy::elf::executeObjcopyOnBinary(Config, InputFile, FB);
}

>
>     6. Address ranges and single addresses pointing to removed code
>     should
>     be marked
>         with tombstone value in the input file:
>
>         -2 for .debug_ranges and .debug_loc.
>         -1 for other .debug* tables.
>
>     7. Prototype implementation - https://reviews.llvm.org/D86539.
>
>     ======================================================================
>
>     Roadmap:
>
>     1. Refactor llvm-objcopy to extract it`s implementation into separate
>     library
>         ObjCopy(in LLVM tree).
>
>
> What exactly needs to be copied? In dsymutil we create a Mach-O 
> companion file, which is really just a regular Mach-O with only the 
> debug info sections in it. I think we do copy over a few segments, but 
> we have to rewrite the load commands and obviously the DWARF sections. 
> Which part of that would be handled by the objcopy library. It seems 
> like this could be a first, standalone patch. Or do you only plan to 
> use this for the ELF parts?
objcopy could replace debug info sections. So the idea is to use objcopy 
functionality to copy
original file without modifications except replacing debug info 
sections.  i.e.
specify new sections to objcopy config:

CopyConfig.h
     StringMap<StringRef> NewDebugSections;

add code to copy these sections to ELF/ELFObjcopy.cpp:

   for (const auto &Sec : Config.NewDebugSections) {
     ArrayRef<uint8_t> DataBits((const uint8_t *)Sec.getValue().data(),
                                Sec.getValue().size());
     Section NewSection(DataBits);

     if (Config.CompressionType != DebugCompressionType::None)
       Obj.addSection<CompressedSection>(NewSection, 
Config.CompressionType);
     else
       Obj.addSection<Section>(NewSection);
   }

Finally, it would be possible to call executeObjcopyOnBinary()
and source file would be copied with replaced debug info sections:

objectcopy::elf::executeObjcopyOnBinary(Config, InputFile, FB);

Speaking of what should be moved from llvm-obcopy into ObjCopy library.
It is Buffer.h, CopyConfig.h and entire ELF, MachO, WASM, COFF directories.
It is done in the prototype(prototype copied only ELF part.)

The external interface of that library would be described by :

ELF/ELFObjcopy.h
COFF/COFFObjcopy.h
MachO/MachOObjcopy.h
wasm/WasmObjcopy.h


>     2. Create a command line utility using existed DWARFLinker and ObjCopy
>         implementation. First version is supposed to work with only ELF
>     input object files.
>         It would take input ELF file with unoptimized debug info and
>     create
>     output
>         ELF file with optimized debug info. That version would be done
>     out
>     of the llvm tree.
>
>
> I would prefer doing this incrementally in-tree. It will make 
> reviewing these patches much easier and hopefully allow us to identify 
> opportunities where we can improve both the ELF and the Mach-O variant.

It is OK to me to start doing it in-tree.


>
>     3. Make a tool to be able to work in multi-thread mode.
>
>
> I'm a bit confused by what you mean here. The current DwarfLinker 
> already does the analysis and cloning in parallel. As I've mentioned 
> in the original thread, when I implemented this, there was no way to 
> do better if you want to deduplicate across compilation units which is 
> what gives the biggest size reduction.
>
>
>     4. Consider it to be included into LLVM tree.
>
>
> As I said before I'd rather see this developed incrementally in-tree.
>
>
>     5. Support DWARF5 tables.
>
>
> I assume you mean the line tables (and not the accelerator tables, 
> i.e. debug names)?

debug_names is already done in dsymutil/DWARFLinker - so no need to 
support this.

I mean debug_line/.debug_line_str, debug_rnglists, debug_loclists, 
DW_OP_addrx.

>
>     ======================================================================
>
>     Appendix A. Should this tool be implemented as a new tool or as an
>     extension
>                  to dsymutil/llvm-objcopy?
>
>         There already exists a tool which removes obsolete debug info on
>     darwin - dsymutil.
>         Why create another tool instead of extending the already existed
>     dsymutil/llvm-objcopy?
>
>         The main functionality of dsymutil is located in a separate
>     library
>     - DWARFLinker.
>         Thus, dsymutil utility is a command-line interface for
>     DWARFLinker.
>     dsymutil has
>         another type of input/output data: it takes several object
>     files and
>     address map
>         as input and creates a .dSYM bundle with linked debug info as
>     output. llvm-dwarfutil
>         would take a built executable as input and create an optimized
>     executable as output.
>         Additionally, there would be many command-line options
>     specific for
>     only one utility.
>         This means that these utilities(implementing command line
>     interface)
>     would significantly
>         differ. It makes sense not to put another command-line utility
>     inside existing dsymutil,
>         but make it as a separate utility. That is the reason why
>     llvm-dwarfutil suggested to be
>         implemented not as sub-part of dsymutil but as a separate tool.
>
>         Please share your preference: whether llvm-dwarfutil should be
>         separate utility, or a variant of dsymutil compiled for ELF?
>
>
> As the majority of the code has already been hoisted to LLVM for use 
> in LLD, I think two separate tools are fine. I would prefer trying to 
> share a common interface, I'm thinking mostly of the command line 
> options. I'm not saying they should be a drop-in replacement for each 
> other, but I'd be nice if we didn't diverge on common functionality.
agreed.
>
>     ======================================================================
>
>     Appendix B. The machO object file format is already supported by
>     dsymutil.
>         Depending on the decision whether llvm-dwarfutil would be done
>     as a
>     subproject
>         of dsymutil or as a separate utility - machO would be
>     supported or not.
>
>
> I don't think there's any value in having the new tool support Mach-O. 
> Things that could be shared should be hoisted into L
>
>
>     ======================================================================
>
>     Appendix C. Support for the COFF and WASM object file formats
>     presented as
>          possible future improvement. It would be quite easy to add them
>     assuming
>          that llvm-objcopy already supports these formats. It also
>     would require
>          supporting DWARF6-suggested tombstone values(-1/-2).
>
>     ======================================================================
>
>     Appendix D. Documentation.
>
>        - proposal for DWARF6 which suggested -1/-2 values for marking bad
>     addresses
>     http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>        - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>        - proposal "Remove obsolete debug info in lld."
>     http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>
>     ======================================================================
>
>     Appendix E. Possible command line options:
>
>     DwarfUtil Options:
>
>        --build-aranges           - generate .debug_aranges table.
>        --build-debug-names       - generate .debug_names table.
>        --build-debug-pubnames    - generate .debug_pubnames table.
>        --build-debug-pubtypes    - generate .debug_pubtypes table.
>        --build-gdb-index         - generate .gdb_index table.
>        --compress                - Compress debug tables.
>        --decompress              - Decompress debug tables.
>        --deduplicate-types       - Do ODR deduplication for debug types.
>        --garbage-collect         - Do garbage collecting for debug info.
>
>
> This is of course up to you to decide, but as a potential user I might 
> be worried about making all the functionality opt-in. For dsymutil you 
> don't have pass any options most of the time. Maybe it would be nice 
> to have a set of defaults and the ability to -fenable or -fdisable 
> them? Or having something like -debugger-tuning in clang?

yes, the idea is to have defaults and be able to switch options on/off.

For the updated prototype:

"llvm-dwarfutil  bin/test_clang_in -o bin/test_clang_out"

assumes --garbage-collect, --strip-unoptimized-debug, --tombstone=bfd.

additionally these options could be explicitly switched on/off:

"llvm-dwarfutil --strip-unoptimized-debug=0 bin/test_clang_in -o 
bin/test_clang_out"


>        --num-threads=<n>         - Specify the maximum number (n) of
>     simultaneous threads
>                                    to use when optimizing input file.
>                                    Defaults to the number of cores on the
>     current machine.
>
>
> We can make `j` the default alias for this option. It's supported by 
> dsymutil but we kept the long option in the help output but I'm happy 
> to change that.

added "j" as alias for the --num-threads.


>        --strip-all               - Strip all debug tables.
>        --strip=<name1,name2>     - Strip specified debug info tables.
>        --strip-unoptimized-debug - Strip all unoptimized debug tables.
>        --tombstone=<value>       - Tombstone value used as a marker of
>     invalid address.
>          =bfd                    -   BFD default value
>          =dwarf6                 -   Dwarf v6.
>        --verbose                 - Enable verbose logging and encoding
>     details.
>
>     Generic Options:
>
>        --help                    - Display available options
>     (--help-hidden
>     for more)
>        --version                 - Display the version of this program
>
>
> dsymutil also has a --verify option which runs the DWARF verifier on 
> the output (I'm working on a patch to also run it on the input). It 
> might be a nice addition to have this too down the road.

Ok, would add it.


Thank you for the comments!

Alexey.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200827/73f7b594/attachment-0001.html>


More information about the llvm-dev mailing list