[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 22 14:22:42 PDT 2020


On Fri, Sep 4, 2020 at 3:42 AM Alexey <avl.lapshin at gmail.com> wrote:

>
> On 03.09.2020 20:56, David Blaikie wrote:
>
>
>
> On Thu, Sep 3, 2020 at 5:15 AM Alexey <avl.lapshin at gmail.com> wrote:
>
>>
>> On 03.09.2020 01:36, David Blaikie wrote:
>>
>>
>>
>> On Wed, Sep 2, 2020 at 3:26 PM Alexey <avl.lapshin at gmail.com> wrote:
>>
>>>
>>> On 02.09.2020 21:44, David Blaikie wrote:
>>>
>>>
>>>
>>> On Wed, Sep 2, 2020 at 9:56 AM Alexey <avl.lapshin at gmail.com> wrote:
>>>
>>>>
>>>> On 01.09.2020 20:07, David Blaikie wrote:
>>>>
>>>> Fair enough - thanks for clarifying the differences! (I'd still lean a
>>>> bit towards this being dwz-esque, as you say "an extension of classic dwz"
>>>>
>>>> I doubt a little about "llvm-dwz" since it might confuse people who
>>>> would expect exactly the same behavior.
>>>> But if we think of it as "an extension of classic dwz" and the possible
>>>> confusion is not a big deal then
>>>> I would be fine with "llvm-dwz".
>>>>
>>>> using a bit more domain knowledge (of terminators and C++ odr - though
>>>> I'm not sure dsymutil does rely on the ODR, does it? It relies on it to
>>>> know that two names represent the same type, I suppose, but doesn't assume
>>>> they're already identical, instead it merges their members))
>>>>
>>>> if dsymutil is able to find a full definition then it would remove all
>>>> other definitions(which matched by name) and set all references to that
>>>> found definition. If it is not able to find a full definition then it would
>>>> do nothing. i.e. if there are two incomplete
>>>> definitions(DW_AT_declaration   (true)) with the same name then they would
>>>> not be merged. That is a possible improvement - to teach dsymutil to merge
>>>> incomplete types.
>>>>
>>> Huh, what does it do with extra member function definitions found in
>>> later definitions? (eg: struct x { template<typename T> void f(); }; - in
>>> one translation unit x::f<int> is instantiated, in another x::f<float> is
>>> instantiated - how are the two represented with dsymutil?)
>>>
>>> They would be considered as two not matched types. dsymutil would not
>>> merge them somehow and thus would not use single type description. There
>>> would be two separate types called "x" which would have mostly matched
>>> members but differ with x::f<int> and x::f<float>. No any de-duplication in
>>> that case.
>>>
>> Oh, that's unfortunate. It'd be nice for C++ at least, to implement a
>> potentially faster dsymutil mode that could get this right and not have to
>> actually check for type equivalence, instead relying on the name of the
>> type to determine that it must be identical.
>>
>> Right. That would result in even more size reduction.
>>
>>
>> The first instance of the type that's encountered has its fully qualified
>> name or mangled name recorded in a map pointing to the DIE. Any future
>> instance gets downgraded to a declaration, and /certain/ members get
>> dropped, but other members get stuck on the declaration (same sort of DWARF
>> you see with "struct foo { virtual void f1(); template<typename T> void
>> f2() { } }; void test(foo& f) { f.f2<int>(); }"). Recording all the member
>> functions of the type/static member variable types might be needed in cases
>> where some member functions are defined in one translation unit and some
>> defined in another - though I guess that infrastructure is already in
>> place/that just works today.
>>
>> My understanding, is that there is not such infrastructure currently.
>> Current infrastructure allows to reference single existing type
>> declaration(canonical) from other units. It does not allow to reference
>> different parts(in different units) of incomplete type.
>>
>
> Huh, so what does the DWARF look like when you define one member function
> in one file, and another member function (common with inline functions) in
> another file?
>
>
>> I think it would be necessary to change the order of how compilation
>> units are processed to implement such types merging.
>>
>
> Oh, I wasn't suggesting merging them - or didn't mean to suggest that. I
> meant doing something like what we do in LLVM for type homed
> (no-standalone) DWARF, where we attach function declarations to type
> declarations, eg:
>
> struct x {
>
>   void f1();
>
>   void f2();
>
>   template<typename T>
>
>   static void f3();
>
> };
>
> #ifdef HOME
>
> void x::f1() {
>
> }
>
> #endif
>
> #ifdef AWAY
>
> void x::f2() {
>
> }
>
> #endif
>
> #ifdef TEMPL
>
> template<typename T>
>
> void x::f3() {
>
> }
>
> template void x::f3<int>();
>
> #endif
>
> Building "HOME" would show the DWARF I'd expect to see the first time a
> type definition is encountered during dsym.
> Building "AWAY" raises the question of - what does dsymutil do with this
> DWARF? Does it deduplicate the type, and make the definition of 'f2' point
> to the 'f2' declaration in the original type described in the prior CU
> defined in "HOME"? If it doesn't do that, it could/that would be good to
> reduce the DWARF size.
> Building "TEMPL" would show the DWARF I'd expect to see if a future use of
> that type definition was encountered but the original/home definition had
> no declaration of this function: we should then emit maybe an "extension"
> to the type (could be a straight declaration, or maybe some newer/weirder
> hybrid that points to the definition with some attribute) & then inject the
> declaration of the template/other new member into this extension
> definition, etc.
>
> Please check the reduced DWARF, generated by current dsymutil for above
> example :
>
> 0x0000000b: DW_TAG_compile_unit
>               DW_AT_language    (DW_LANG_C_plus_plus)
>               DW_AT_name        ("home.cpp")
>               DW_AT_stmt_list   (0x00000000)
>               DW_AT_low_pc      (0x0000000100000f80)
>               DW_AT_high_pc     (0x0000000100000f8b)
>
> 0x0000002a:   DW_TAG_structure_type
>                 DW_AT_name      ("x")
>                 DW_AT_byte_size (0x01)
>
> 0x00000033:     DW_TAG_subprogram
>                   DW_AT_linkage_name    ("_ZN1x2f1Ev")
>                   DW_AT_name    ("f1")
>                   DW_AT_type    (0x000000000000005e "int")
>                   DW_AT_declaration     (true)
>                   DW_AT_external        (true)
>                   DW_AT_APPLE_optimized (true)
>
> 0x00000047:       NULL
>
> 0x00000048:     DW_TAG_subprogram
>                   DW_AT_linkage_name    ("_ZN1x2f2Ev")
>                   DW_AT_name    ("f2")
>                   DW_AT_type    (0x000000000000005e "int")
>                   DW_AT_declaration     (true)
>                   DW_AT_external        (true)
>                   DW_AT_APPLE_optimized (true)
>
> 0x0000005c:       NULL
> 0x0000005d:     NULL
>
> 0x0000006a:   DW_TAG_subprogram
>                 DW_AT_low_pc    (0x0000000100000f80)
>                 DW_AT_high_pc   (0x0000000100000f8b)
>                 DW_AT_specification     (0x0000000000000033 "_ZN1x2f1Ev")
>
>
> 0x000000a0: DW_TAG_compile_unit
>               DW_AT_language    (DW_LANG_C_plus_plus)
>               DW_AT_name        ("away.cpp")
>               DW_AT_stmt_list   (0x00000048)
>               DW_AT_low_pc      (0x0000000100000f90)
>               DW_AT_high_pc     (0x0000000100000f9b)
>
> 0x000000c6:   DW_TAG_subprogram
>                 DW_AT_low_pc    (0x0000000100000f90)
>                 DW_AT_high_pc   (0x0000000100000f9b)
>                 DW_AT_specification     (0x0000000000000048 "_ZN1x2f2Ev")
>
> 0x000000fc: DW_TAG_compile_unit
>               DW_AT_language    (DW_LANG_C_plus_plus)
>               DW_AT_name        ("templ.cpp")
>               DW_AT_stmt_list   (0x00000090)
>               DW_AT_low_pc      (0x0000000100000fa0)
>               DW_AT_high_pc     (0x0000000100000fab)
>
> 0x0000011b:   DW_TAG_structure_type
>                 DW_AT_name      ("x")
>                 DW_AT_byte_size (0x01)
>
> 0x00000124:     DW_TAG_subprogram
>                   DW_AT_linkage_name    ("_ZN1x2f1Ev")
>                   DW_AT_name    ("f1")
>                   DW_AT_type    (0x0000000000000168 "int")
>                   DW_AT_declaration     (true)
>                   DW_AT_external        (true)
>                   DW_AT_APPLE_optimized (true)
> 0x00000138:       NULL
>
> 0x00000139:     DW_TAG_subprogram
>                   DW_AT_linkage_name    ("_ZN1x2f2Ev")
>                   DW_AT_name    ("f2")
>                   DW_AT_type    (0x0000000000000168 "int")
>                   DW_AT_declaration     (true)
>                   DW_AT_external        (true)
>                   DW_AT_APPLE_optimized (true)
> 0x0000014d:       NULL
>
> 0x0000014e:     DW_TAG_subprogram
>                   DW_AT_linkage_name    ("_ZN1x2f3IiEEiv")
>                   DW_AT_name    ("f3<int>")
>                   DW_AT_type    (0x0000000000000168 "int")
>                   DW_AT_declaration     (true)
>                   DW_AT_external        (true)
>                   DW_AT_APPLE_optimized (true)
> 0x00000166:       NULL
> 0x00000167:     NULL
>
> 0x00000174:   DW_TAG_subprogram
>                 DW_AT_low_pc    (0x0000000100000fa0)
>                 DW_AT_high_pc   (0x0000000100000fab)
>                 DW_AT_specification     (0x000000000000014e
> "_ZN1x2f3IiEEiv")
> 0x00000190:     NULL
>
>
> >Building "HOME" would show the DWARF I'd expect to see the first time a
> type definition is encountered during dsym.
>
> compile unit "home.cpp" contains the type definition(0x0000002a) and
> reference to its member(DW_AT_specification     (0x0000000000000033
> "_ZN1x2f1Ev")).
>
> >Building "AWAY" raises the question of - what does dsymutil do with this
> DWARF? Does it deduplicate the type, and make the definition of 'f2' point
> to the 'f2' declaration in the original type described in the prior CU
> defined in "HOME"? If it doesn't do that, it could/that would be good to
> reduce the DWARF size.
>
> compile unit "away.cpp" does not contain type definition and contains
> reference to type definition from compile unit "home.cpp"
> (DW_AT_specification     (0x0000000000000048 "_ZN1x2f2Ev")).
> i.e. dsymutil deduplicates the type and makes the definition of 'f2' point
> to the 'f2' declaration in the original type described in the prior CU
> "home.cpp".
>
> >Building "TEMPL" would show the DWARF I'd expect to see if a future use
> of that type definition was encountered but the original/home definition
> had no declaration of this function: we should then emit maybe an
> "extension" to the type (could be a straight declaration, or maybe some
> newer/weirder hybrid that points to the definition with some attribute) &
> then inject the declaration of the template/other new member into this
> extension definition, etc.
>
> compile unit "templ.cpp" contains the type definition(0x0000011b) which
> matches with (0x0000002a) plus defines the new member 0x0000014e.
> It also references this new member by DW_AT_specification
> (0x000000000000014e "_ZN1x2f3IiEEiv"). In this case type description is not
> de-duplicated.
>

Ah, yeah - that seems like a missed opportunity - duplicating the whole
type DIE. LTO does this by making monolithic types - merging all the
members from different definitions of the same type into one, but that's
maybe too expensive for dsymutil (might still be interesting to know how
much more expensive, etc). But I think the other way to go would be to
produce a declaration of the type, with the relevant members - and let the
DWARF consumer identify this declaration as matching up with the earlier
definition. That's the sort of DWARF you get from the non-MachO default
-fno-standalone-debug anyway, so it's already pretty well tested/supported
(support in lldb's a bit younger/more work-in-progress, admittedly). I
wonder how much dsym size there is that could be reduced by such an
implementation.


>
> Do you suggest that 0x0000011b should be transformed into something like
> that:
>
> 0x000000fc: DW_TAG_compile_unit
>               DW_AT_language    (DW_LANG_C_plus_plus)
>               DW_AT_name        ("templ.cpp")
>               DW_AT_stmt_list   (0x00000090)
>               DW_AT_low_pc      (0x0000000100000fa0)
>               DW_AT_high_pc     (0x0000000100000fab)
>
> 0x0000011b:   DW_TAG_structure_type
>                 DW_AT_specification (0x0000002a "x")
>
> 0x00000124:     DW_TAG_subprogram
>                   DW_AT_linkage_name    ("_ZN1x2f3IiEEiv")
>                   DW_AT_name    ("f3<int>")
>                   DW_AT_type    (0x000000000000005e "int")
>                   DW_AT_declaration     (true)
>                   DW_AT_external        (true)
>                   DW_AT_APPLE_optimized (true)
> 0x00000138:       NULL
> 0x00000139:     NULL
>
> 0x00000140:   DW_TAG_subprogram
>                 DW_AT_low_pc    (0x0000000100000fa0)
>                 DW_AT_high_pc   (0x0000000100000fab)
>                 DW_AT_specification     (0x0000000000000124
> "_ZN1x2f3IiEEiv")
> 0x00000155:     NULL
>
> Did I correctly get the idea?
>

Yep, more or less. It'd be "safer" if 11b didn't use DW_AT_specification to
refer to 2a, but instead was only a completely independent declaration of
"x" - that path is already well supported/tested (well, it's the
work-in-progress stuff for lldb to support -fno-standalone-debug, but gdb's
been consuming DWARF like this for years, Clang and GCC both produce DWARF
like this (if the type is "homed" in another file, then Clang/GCC produce
DWARF that emits a declaration with just the members needed to define any
member functions defined/inlined/referenced in this CU)) for years.

But using DW_AT_specification, or maybe some other extension attribute
might make the consumers task a bit easier (could do both - use an
extension attribute to tie them up, leave DW_AT_declaration/DW_AT_name here
for consumers that don't understand the extension attribute) in finding
that they're all the same type/pieces of teh same type.


>
>
>
>
>> Currently, after the compilation unit is analyzed(scanned for types and
>> dead info) it started to be emitted.
>> It looks like, to support merging, it would be necessary to analyze all
>> CUs first(to create canonical representation) and then start to emit them.
>>
>> I am going to start to work on a prototype of parallel per-compilation
>> unit implementation of DWARFLinker.
>> (basing on the scenario which Jonas described in other letter in that
>> thread).
>> The types merging could be the next step...
>>
>> Following is the result of compilation of your example on darwin(showing
>> that dsymutil does not merge such types):
>>
>
> Ah, yeah, that is unfortunate - so if there were other members of "x" they
> would be duplicated in this case, right?
>
> This is a pretty common issue in C++ - there are 3 reasons I know of where
> LLVM would produce distinct descriptions:
> 1) member function templates, like this
> 2) member/nested types
> 3) implicit special members (not present unless instantiated - so if you
> copy construct an object in one file and not in another, two different
> types)
>
>
>>
>> $ cat struct.h
>>
>> #ifndef MY_H
>> #define MY_H
>>
>> struct foo {
>>   template <class T> int fff () { return sizeof(T); }
>> };
>>
>> #endif // MY_H
>>
>> $ cat mod1.cpp
>>
>> #include "struct.h"
>> int test1 ( ) {
>>   foo var;
>>   return var.fff<int>();
>> }
>>
>> $ cat mod2.cpp
>>
>> #include "struct.h"
>> int test2 ( ) {
>>   foo var;
>>   return var.fff<float>();
>> }
>>
>> $ cat main.cpp
>>
>> #include "struct.h"
>> int test1();
>> int test2();
>> int main ( void ) {
>>   test1();
>>   test2();
>>   return 0;
>> }
>>
>> $ clang++ main.cpp mod1.cpp mod2.cpp -O -g -fno-inline
>>
>> $ llvm-dwarfdump -a a.out.dSYM/Contents/Resources/DWARF/a.out | less
>>
>> 0x00000056: DW_TAG_compile_unit
>>
>>               DW_AT_language    (DW_LANG_C_plus_plus)
>>               DW_AT_name        ("mod1.cpp")
>>
>> 0x000000ae:   DW_TAG_structure_type   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>
>>                 DW_AT_name      ("foo")
>>                 DW_AT_byte_size (0x01)
>>
>> 0x000000b7:     DW_TAG_subprogram
>>
>>                   DW_AT_linkage_name    ("_ZN3foo3fffIiEEiv")
>>                   DW_AT_name    ("fff<int>")
>>
>>
>> 0x0000011f: DW_TAG_compile_unit
>>
>>               DW_AT_language    (DW_LANG_C_plus_plus)
>>               DW_AT_name        ("mod2.cpp")
>>
>> 0x00000177:   DW_TAG_structure_type   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>
>>                 DW_AT_name      ("foo")
>>                 DW_AT_byte_size (0x01)
>>
>> 0x00000180:     DW_TAG_subprogram
>>
>>                   DW_AT_linkage_name    ("_ZN3foo3fffIfEEiv")
>>                   DW_AT_name    ("fff<float>")
>>
>>
>>
>> - Dave
>>
>>>
>>> Alexey.
>>>>
>>>>
>>>> But I don't have super strong feelings about the naming.
>>>>
>>>> On Tue, Sep 1, 2020 at 6:36 AM Alexey <avl.lapshin at gmail.com> wrote:
>>>>
>>>>>
>>>>> On 01.09.2020 06:27, David Blaikie wrote:
>>>>>
>>>>> A quick note: The feature as currently proposed sounds like it's an
>>>>> exact match for 'dwz'? Is there any benefit to this over the existing dwz
>>>>> project? Is it different in some ways I'm not aware of? (I haven't actually
>>>>> used dwz, so I might have some mistaken ideas about how it should work)
>>>>>
>>>>> If it's going to solve the same general problem, but be in the llvm
>>>>> project instead, then maybe it should be called llvm-dwz.
>>>>>
>>>>> It looks like dwz and llvm-dwarfutil are not exactly matched in
>>>>> functionality.
>>>>>
>>>>> dwz is a  program that attempts to optimize DWARF debugging
>>>>> information
>>>>> contained in ELF shared libraries and ELF executables for *size*.
>>>>>
>>>>> llvm-dwarfutil is a tool that is used for processing debug
>>>>> info(DWARF) located in built binary files to improve debug info
>>>>> *quality*,
>>>>> reduce debug info *size* and accelerate debug info *processing*.
>>>>>
>>>>> Things which are supposed to be done by llvm-dwarfutil and which are
>>>>> not
>>>>> done by dwz: removing obsolete debug info, building indexes, stripping
>>>>> unneeded debug sections, compress/decompress debug sections.
>>>>>
>>>>> Common thing is that both of these tools do debug info size reduction.
>>>>> But they do this using different approaches:
>>>>>
>>>>> 1. dwz reduces the size of debug info by creating partial compilation
>>>>> units
>>>>>     for duplicated parts. So that these partial compilation units
>>>>> could be imported
>>>>>     in every duplicated place. AFAIU, That optimization gives the most
>>>>> size saving effect.
>>>>>
>>>>>    another size saving optimization is ODR types deduplication.
>>>>>
>>>>> 2. llvm-dwarfutil reduces the size of debug info by ODR types
>>>>> deduplication
>>>>>    which gives the most size saving effect in llvm-dwarfutil case.
>>>>>
>>>>>    another size saving optimization is removing obsolete debug info.
>>>>>    (which actually is not only about size but about correctness also)
>>>>>
>>>>> So, it looks like these tools are not equal. If we would consider that
>>>>> llvm-dwz is an extension of classic dwz then we could probably
>>>>> name it as llvm-dwz.
>>>>>
>>>>>
>>>>> Though I understand the desire for this to grow other functionality,
>>>>> like DWARF-aware dwp-ing. Might be better for this to busybox and provide
>>>>> that functionality under llvm-dwp instead, or more likely I Suspect, that
>>>>> the existing llvm-dwp will be rewritten (probably by me) to use more of
>>>>> lld's infrastructure to be more efficient (it's current object
>>>>> reading/writing logic is using LLVM's libObject and MCStreamer, which is a
>>>>> bit inefficient for a very content-unaware linking process) and then maybe
>>>>> that could be taught to use DwarfLinker as a library to optionally do
>>>>> DWARF-aware linking depending on the users time/space tradeoff desires.
>>>>> Still benefiting from any improvements to the underlying DwarfLinker
>>>>> library (at which point that would be shared between llvm-dsymutil,
>>>>> llvm-dwz, and llvm-dwp).
>>>>>
>>>>> On Tue, Aug 25, 2020 at 7:29 AM Alexey <avl.lapshin at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>    We propose llvm-dwarfutil - a dsymutil-like tool for ELF.
>>>>>>    Any thoughts on this?
>>>>>>    Thanks in advance, Alexey.
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> llvm-dwarfutil(Apndx A) - is a tool that is used for processing debug
>>>>>> info(DWARF)
>>>>>> located in built binary files to improve debug info quality,
>>>>>> reduce debug info size and accelerate debug info processing.
>>>>>> Supported object files formats: ELF, MachO(Apndx B), COFF(Apndx C),
>>>>>> WASM(Apndx C).
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> Specifically, the tool would do:
>>>>>>
>>>>>>    - Remove obsolete debug info which refers to code deleted by the
>>>>>> linker
>>>>>>      doing the garbage collection (gc-sections).
>>>>>>
>>>>>>    - Deduplicate debug type definitions for reducing resulting size
>>>>>> of
>>>>>> binary.
>>>>>>
>>>>>>    - Build accelerator/index tables.
>>>>>>      = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>>>>>> .debug_pubtypes.
>>>>>>
>>>>>>    - Strip unneeded tables.
>>>>>>      = .debug_aranges, .debug_names, .gdb_index, .debug_pubnames,
>>>>>> .debug_pubtypes.
>>>>>>
>>>>>>    - Compress or decompress debug info as requested.
>>>>>>
>>>>>> Possible feature:
>>>>>>
>>>>>>    - Join split dwarf .dwo files in a single file containing all
>>>>>> debug info
>>>>>>      (convert split DWARF into monolithic DWARF).
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> User interface:
>>>>>>
>>>>>>    OVERVIEW: A tool for optimizing debug info located in the built
>>>>>> binary.
>>>>>>
>>>>>>    USAGE: llvm-dwarfutil [options] input output
>>>>>>
>>>>>>    OPTIONS: (Apndx E)
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> Implementation notes:
>>>>>>
>>>>>> 1. Removing obsolete debug info would be done using DWARFLinker llvm
>>>>>> library.
>>>>>>
>>>>>> 2. Data types deduplication would be done using DWARFLinker llvm
>>>>>> library.
>>>>>>
>>>>>> 3. Accelerator/index tables would be generated using DWARFLinker llvm
>>>>>> library.
>>>>>>
>>>>>> 4. Interface of DWARFLinker library would be changed in such way that
>>>>>> it
>>>>>>     would be possible to switch on/off various stages:
>>>>>>
>>>>>>    class DWARFLinker {
>>>>>>      setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo = false);
>>>>>>
>>>>>>      setDoAppleNames ( bool DoAppleNames = false );
>>>>>>      setDoAppleNamespaces ( bool DoAppleNamespaces = false );
>>>>>>      setDoAppleTypes ( bool DoAppleTypes = false );
>>>>>>      setDoObjC ( bool DoObjC = false );
>>>>>>      setDoDebugPubNames ( bool DoDebugPubNames = false );
>>>>>>      setDoDebugPubTypes ( bool DoDebugPubTypes = false );
>>>>>>
>>>>>>      setDoDebugNames (bool DoDebugNames = false);
>>>>>>      setDoGDBIndex (bool DoGDBIndex = false);
>>>>>>    }
>>>>>>
>>>>>> 5. Copying source file contents, stripping tables,
>>>>>> compressing/decompressing tables
>>>>>>     would be done by ObjCopy llvm library(extracted from
>>>>>> llvm-objcopy):
>>>>>>
>>>>>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>>>>>                               object::COFFObjectFile &In, Buffer
>>>>>> &Out);
>>>>>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>>>>>                               object::ELFObjectFileBase &In, Buffer
>>>>>> &Out);
>>>>>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>>>>>                               object::MachOObjectFile &In, Buffer
>>>>>> &Out);
>>>>>>    Error executeObjcopyOnBinary(const CopyConfig &Config,
>>>>>>                               object::WasmObjectFile &In, Buffer
>>>>>> &Out);
>>>>>>
>>>>>> 6. Address ranges and single addresses pointing to removed code
>>>>>> should
>>>>>> be marked
>>>>>>     with tombstone value in the input file:
>>>>>>
>>>>>>     -2 for .debug_ranges and .debug_loc.
>>>>>>     -1 for other .debug* tables.
>>>>>>
>>>>>> 7. Prototype implementation - https://reviews.llvm.org/D86539.
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> Roadmap:
>>>>>>
>>>>>> 1. Refactor llvm-objcopy to extract it`s implementation into separate
>>>>>> library
>>>>>>     ObjCopy(in LLVM tree).
>>>>>>
>>>>>> 2. Create a command line utility using existed DWARFLinker and ObjCopy
>>>>>>     implementation. First version is supposed to work with only ELF
>>>>>> input object files.
>>>>>>     It would take input ELF file with unoptimized debug info and
>>>>>> create
>>>>>> output
>>>>>>     ELF file with optimized debug info. That version would be done
>>>>>> out
>>>>>> of the llvm tree.
>>>>>>
>>>>>> 3. Make a tool to be able to work in multi-thread mode.
>>>>>>
>>>>>> 4. Consider it to be included into LLVM tree.
>>>>>>
>>>>>> 5. Support DWARF5 tables.
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> Appendix A. Should this tool be implemented as a new tool or as an
>>>>>> extension
>>>>>>              to dsymutil/llvm-objcopy?
>>>>>>
>>>>>>     There already exists a tool which removes obsolete debug info on
>>>>>> darwin - dsymutil.
>>>>>>     Why create another tool instead of extending the already existed
>>>>>> dsymutil/llvm-objcopy?
>>>>>>
>>>>>>     The main functionality of dsymutil is located in a separate
>>>>>> library
>>>>>> - DWARFLinker.
>>>>>>     Thus, dsymutil utility is a command-line interface for
>>>>>> DWARFLinker.
>>>>>> dsymutil has
>>>>>>     another type of input/output data: it takes several object files
>>>>>> and
>>>>>> address map
>>>>>>     as input and creates a .dSYM bundle with linked debug info as
>>>>>> output. llvm-dwarfutil
>>>>>>     would take a built executable as input and create an optimized
>>>>>> executable as output.
>>>>>>     Additionally, there would be many command-line options specific
>>>>>> for
>>>>>> only one utility.
>>>>>>     This means that these utilities(implementing command line
>>>>>> interface)
>>>>>> would significantly
>>>>>>     differ. It makes sense not to put another command-line utility
>>>>>> inside existing dsymutil,
>>>>>>     but make it as a separate utility. That is the reason why
>>>>>> llvm-dwarfutil suggested to be
>>>>>>     implemented not as sub-part of dsymutil but as a separate tool.
>>>>>>
>>>>>>     Please share your preference: whether llvm-dwarfutil should be
>>>>>>     separate utility, or a variant of dsymutil compiled for ELF?
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> Appendix B. The machO object file format is already supported by
>>>>>> dsymutil.
>>>>>>     Depending on the decision whether llvm-dwarfutil would be done as
>>>>>> a
>>>>>> subproject
>>>>>>     of dsymutil or as a separate utility - machO would be supported
>>>>>> or not.
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> Appendix C. Support for the COFF and WASM object file formats
>>>>>> presented as
>>>>>>      possible future improvement. It would be quite easy to add them
>>>>>> assuming
>>>>>>      that llvm-objcopy already supports these formats. It also would
>>>>>> require
>>>>>>      supporting DWARF6-suggested tombstone values(-1/-2).
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> Appendix D. Documentation.
>>>>>>
>>>>>>    - proposal for DWARF6 which suggested -1/-2 values for marking bad
>>>>>> addresses
>>>>>>      http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>>>>>>    - dsymutil tool https://llvm.org/docs/CommandGuide/dsymutil.html.
>>>>>>    - proposal "Remove obsolete debug info in lld."
>>>>>> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html
>>>>>>
>>>>>> ======================================================================
>>>>>>
>>>>>> Appendix E. Possible command line options:
>>>>>>
>>>>>> DwarfUtil Options:
>>>>>>
>>>>>>    --build-aranges           - generate .debug_aranges table.
>>>>>>    --build-debug-names       - generate .debug_names table.
>>>>>>    --build-debug-pubnames    - generate .debug_pubnames table.
>>>>>>    --build-debug-pubtypes    - generate .debug_pubtypes table.
>>>>>>    --build-gdb-index         - generate .gdb_index table.
>>>>>>    --compress                - Compress debug tables.
>>>>>>    --decompress              - Decompress debug tables.
>>>>>>    --deduplicate-types       - Do ODR deduplication for debug types.
>>>>>>    --garbage-collect         - Do garbage collecting for debug info.
>>>>>>    --num-threads=<n>         - Specify the maximum number (n) of
>>>>>> simultaneous threads
>>>>>>                                to use when optimizing input file.
>>>>>>                                Defaults to the number of cores on the
>>>>>> current machine.
>>>>>>    --strip-all               - Strip all debug tables.
>>>>>>    --strip=<name1,name2>     - Strip specified debug info tables.
>>>>>>    --strip-unoptimized-debug - Strip all unoptimized debug tables.
>>>>>>    --tombstone=<value>       - Tombstone value used as a marker of
>>>>>> invalid address.
>>>>>>      =bfd                    -   BFD default value
>>>>>>      =dwarf6                 -   Dwarf v6.
>>>>>>    --verbose                 - Enable verbose logging and encoding
>>>>>> details.
>>>>>>
>>>>>> Generic Options:
>>>>>>
>>>>>>    --help                    - Display available options
>>>>>> (--help-hidden
>>>>>> for more)
>>>>>>    --version                 - Display the version of this program
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201022/1f11a8a5/attachment-0001.html>


More information about the llvm-dev mailing list