[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
David Blaikie via llvm-dev
llvm-dev at lists.llvm.org
Wed Sep 2 11:44:24 PDT 2020
On Wed, Sep 2, 2020 at 9:56 AM Alexey <avl.lapshin at gmail.com> wrote:
>
> On 01.09.2020 20:07, David Blaikie wrote:
>
> Fair enough - thanks for clarifying the differences! (I'd still lean a bit
> towards this being dwz-esque, as you say "an extension of classic dwz"
>
> I doubt a little about "llvm-dwz" since it might confuse people who would
> expect exactly the same behavior.
> But if we think of it as "an extension of classic dwz" and the possible
> confusion is not a big deal then
> I would be fine with "llvm-dwz".
>
> using a bit more domain knowledge (of terminators and C++ odr - though I'm
> not sure dsymutil does rely on the ODR, does it? It relies on it to know
> that two names represent the same type, I suppose, but doesn't assume
> they're already identical, instead it merges their members))
>
> if dsymutil is able to find a full definition then it would remove all
> other definitions(which matched by name) and set all references to that
> found definition. If it is not able to find a full definition then it would
> do nothing. i.e. if there are two incomplete
> definitions(DW_AT_declaration (true)) with the same name then they would
> not be merged. That is a possible improvement - to teach dsymutil to merge
> incomplete types.
>
Huh, what does it do with extra member function definitions found in later
definitions? (eg: struct x { template<typename T> void f(); }; - in one
translation unit x::f<int> is instantiated, in another x::f<float> is
instantiated - how are the two represented with dsymutil?)
> Alexey.
>
>
> But I don't have super strong feelings about the naming.
>
> On Tue, Sep 1, 2020 at 6:36 AM Alexey <avl.lapshin at gmail.com> wrote:
>
>>
>> On 01.09.2020 06:27, David Blaikie wrote:
>>
>> A quick note: The feature as currently proposed sounds like it's an exact
>> match for 'dwz'? Is there any benefit to this over the existing dwz
>> project? Is it different in some ways I'm not aware of? (I haven't actually
>> used dwz, so I might have some mistaken ideas about how it should work)
>>
>> If it's going to solve the same general problem, but be in the llvm
>> project instead, then maybe it should be called llvm-dwz.
>>
>> It looks like dwz and llvm-dwarfutil are not exactly matched in
>> functionality.
>>
>> dwz is a program that attempts to optimize DWARF debugging information
>> contained in ELF shared libraries and ELF executables for *size*.
>>
>> llvm-dwarfutil is a tool that is used for processing debug
>> info(DWARF) located in built binary files to improve debug info *quality*,
>> reduce debug info *size* and accelerate debug info *processing*.
>>
>> Things which are supposed to be done by llvm-dwarfutil and which are not
>> done by dwz: removing obsolete debug info, building indexes, stripping
>> unneeded debug sections, compress/decompress debug sections.
>>
>> Common thing is that both of these tools do debug info size reduction.
>> But they do this using different approaches:
>>
>> 1. dwz reduces the size of debug info by creating partial compilation
>> units
>> for duplicated parts. So that these partial compilation units could
>> be imported
>> in every duplicated place. AFAIU, That optimization gives the most
>> size saving effect.
>>
>> another size saving optimization is ODR types deduplication.
>>
>> 2. llvm-dwarfutil reduces the size of debug info by ODR types
>> deduplication
>> which gives the most size saving effect in llvm-dwarfutil case.
>>
>> another size saving optimization is removing obsolete debug info.
>> (which actually is not only about size but about correctness also)
>>
>> So, it looks like these tools are not equal. If we would consider that
>> llvm-dwz is an extension of classic dwz then we could probably
>> name it as llvm-dwz.
>>
>>
>> Though I understand the desire for this to grow other functionality, like
>> DWARF-aware dwp-ing. Might be better for this to busybox and provide that
>> functionality under llvm-dwp instead, or more likely I Suspect, that the
>> existing llvm-dwp will be rewritten (probably by me) to use more of lld's
>> infrastructure to be more efficient (it's current object reading/writing
>> logic is using LLVM's libObject and MCStreamer, which is a bit inefficient
>> for a very content-unaware linking process) and then maybe that could be
>> taught to use DwarfLinker as a library to optionally do DWARF-aware linking
>> depending on the users time/space tradeoff desires. Still benefiting from
>> any improvements to the underlying DwarfLinker library (at which point that
>> would be shared between llvm-dsymutil, llvm-dwz, and llvm-dwp).
>>
>> On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>>> Any thoughts on this?
>>> Thanks in advance, Alexey.
>>>
>>> ======================================================================
>>>
>>> llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>>> info(DWARF)
>>> located in built binary files to improve debug info quality,
>>> reduce debug info size and accelerate debug info processing.
>>> Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>>> WASM(Apndx C).
>>>
>>> ======================================================================
>>>
>>> Specifically, the tool would do:
>>>
>>> - Remove obsolete debug info which refers to code deleted by the
>>> linker
>>> doing the garbage collection (gc-sections).
>>>
>>> - Deduplicate debug type definitions for reducing resulting size of
>>> binary.
>>>
>>> - Build accelerator/index tables.
>>> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>>> .debug_pubtypes.
>>>
>>> - Strip unneeded tables.
>>> = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>>> .debug_pubtypes.
>>>
>>> - Compress or decompress debug info as requested.
>>>
>>> Possible feature:
>>>
>>> - Join split dwarf .dwo files in a single file containing all debug
>>> info
>>> (convert split DWARF into monolithic DWARF).
>>>
>>> ======================================================================
>>>
>>> User interface:
>>>
>>> OVERVIEW: A tool for optimizing debug info located in the built
>>> binary.
>>>
>>> USAGE: llvm-dwarfutil [options] input output
>>>
>>> OPTIONS: (Apndx E)
>>>
>>> ======================================================================
>>>
>>> Implementation notes:
>>>
>>> 1. Removing obsolete debug info would be done using DWARFLinker llvm
>>> library.
>>>
>>> 2. Data types deduplication would be done using DWARFLinker llvm library.
>>>
>>> 3. Accelerator/index tables would be generated using DWARFLinker llvm
>>> library.
>>>
>>> 4. Interface of DWARFLinker library would be changed in such way that it
>>> would be possible to switch on/off various stages:
>>>
>>> class DWARFLinker {
>>> setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>>>
>>> setDoAppleNames ( bool DoAppleNames = false );
>>> setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>>> setDoAppleTypes ( bool DoAppleTypes = false );
>>> setDoObjC ( bool DoObjC = false );
>>> setDoDebugPubNames ( bool DoDebugPubNames = false );
>>> setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>>
>>> setDoDebugNames (bool DoDebugNames = false);
>>> setDoGDBIndex (bool DoGDBIndex = false);
>>> }
>>>
>>> 5. Copying source file contents, stripping tables,
>>> compressing/decompressing tables
>>> would be done by ObjCopy llvm library(extracted from llvm-objcopy):
>>>
>>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>>> object::COFFObjectFile &In, Buffer &Out);
>>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>>> object::ELFObjectFileBase &In, Buffer
>>> &Out);
>>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>>> object::MachOObjectFile &In, Buffer &Out);
>>> Error executeObjcopyOnBinary(const CopyConfig &Config,
>>> object::WasmObjectFile &In, Buffer &Out);
>>>
>>> 6. Address ranges and single addresses pointing to removed code should
>>> be marked
>>> with tombstone value in the input file:
>>>
>>> -2 for .debug_ranges and .debug_loc.
>>> -1 for other .debug* tables.
>>>
>>> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>>>
>>> ======================================================================
>>>
>>> Roadmap:
>>>
>>> 1. Refactor llvm-objcopy to extract it`s implementation into separate
>>> library
>>> ObjCopy(in LLVM tree).
>>>
>>> 2. Create a command line utility using existed DWARFLinker and ObjCopy
>>> implementation. First version is supposed to work with only ELF
>>> input object files.
>>> It would take input ELF file with unoptimized debug info and create
>>> output
>>> ELF file with optimized debug info. That version would be done out
>>> of the llvm tree.
>>>
>>> 3. Make a tool to be able to work in multi-thread mode.
>>>
>>> 4. Consider it to be included into LLVM tree.
>>>
>>> 5. Support DWARF5 tables.
>>>
>>> ======================================================================
>>>
>>> Appendix A. Should this tool be implemented as a new tool or as an
>>> extension
>>> to dsymutil/llvm-objcopy?
>>>
>>> There already exists a tool which removes obsolete debug info on
>>> darwin - dsymutil.
>>> Why create another tool instead of extending the already existed
>>> dsymutil/llvm-objcopy?
>>>
>>> The main functionality of dsymutil is located in a separate library
>>> - DWARFLinker.
>>> Thus, dsymutil utility is a command-line interface for DWARFLinker.
>>> dsymutil has
>>> another type of input/output data: it takes several object files and
>>> address map
>>> as input and creates a .dSYM bundle with linked debug info as
>>> output. llvm-dwarfutil
>>> would take a built executable as input and create an optimized
>>> executable as output.
>>> Additionally, there would be many command-line options specific for
>>> only one utility.
>>> This means that these utilities(implementing command line interface)
>>> would significantly
>>> differ. It makes sense not to put another command-line utility
>>> inside existing dsymutil,
>>> but make it as a separate utility. That is the reason why
>>> llvm-dwarfutil suggested to be
>>> implemented not as sub-part of dsymutil but as a separate tool.
>>>
>>> Please share your preference: whether llvm-dwarfutil should be
>>> separate utility, or a variant of dsymutil compiled for ELF?
>>>
>>> ======================================================================
>>>
>>> Appendix B. The machO object file format is already supported by
>>> dsymutil.
>>> Depending on the decision whether llvm-dwarfutil would be done as a
>>> subproject
>>> of dsymutil or as a separate utility - machO would be supported or
>>> not.
>>>
>>> ======================================================================
>>>
>>> Appendix C. Support for the COFF and WASM object file formats presented
>>> as
>>> possible future improvement. It would be quite easy to add them
>>> assuming
>>> that llvm-objcopy already supports these formats. It also would
>>> require
>>> supporting DWARF6-suggested tombstone values(-1/-2).
>>>
>>> ======================================================================
>>>
>>> Appendix D. Documentation.
>>>
>>> - proposal for DWARF6 which suggested -1/-2 values for marking bad
>>> addresses
>>> http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>> - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>>> - proposal "Remove obsolete debug info in lld."
>>> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>>
>>> ======================================================================
>>>
>>> Appendix E. Possible command line options:
>>>
>>> DwarfUtil Options:
>>>
>>> --build-aranges - generate .debug_aranges table.
>>> --build-debug-names - generate .debug_names table.
>>> --build-debug-pubnames - generate .debug_pubnames table.
>>> --build-debug-pubtypes - generate .debug_pubtypes table.
>>> --build-gdb-index - generate .gdb_index table.
>>> --compress - Compress debug tables.
>>> --decompress - Decompress debug tables.
>>> --deduplicate-types - Do ODR deduplication for debug types.
>>> --garbage-collect - Do garbage collecting for debug info.
>>> --num-threads=<n> - Specify the maximum number (n) of
>>> simultaneous threads
>>> to use when optimizing input file.
>>> Defaults to the number of cores on the
>>> current machine.
>>> --strip-all - Strip all debug tables.
>>> --strip=<name1,name2> - Strip specified debug info tables.
>>> --strip-unoptimized-debug - Strip all unoptimized debug tables.
>>> --tombstone=<value> - Tombstone value used as a marker of
>>> invalid address.
>>> =bfd - BFD default value
>>> =dwarf6 - Dwarf v6.
>>> --verbose - Enable verbose logging and encoding
>>> details.
>>>
>>> Generic Options:
>>>
>>> --help - Display available options (--help-hidden
>>> for more)
>>> --version - Display the version of this program
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200902/08f00118/attachment-0001.html>
More information about the llvm-dev
mailing list