<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 4, 2020 at 3:42 AM Alexey <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 03.09.2020 20:56, David Blaikie
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Sep 3, 2020 at 5:15
AM Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 03.09.2020 01:36, David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Sep 2,
2020 at 3:26 PM Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 02.09.2020 21:44, David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed,
Sep 2, 2020 at 9:56 AM Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 01.09.2020 20:07, David
Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Fair enough - thanks
for clarifying the differences!
(I'd still lean a bit towards this
being dwz-esque, as you say "an
extension of classic dwz"</div>
</blockquote>
I doubt a little about "llvm-dwz"
since it might confuse people who
would expect exactly the same
behavior.<br>
But if we think of it as "an extension
of classic dwz" and the possible
confusion is not a big deal then<br>
I would be fine with "llvm-dwz".<br>
<blockquote type="cite">
<div dir="ltr"> using a bit more
domain knowledge (of terminators
and C++ odr - though I'm not sure
dsymutil does rely on the ODR,
does it? It relies on it to know
that two names represent the same
type, I suppose, but doesn't
assume they're already identical,
instead it merges their members))<br>
</div>
</blockquote>
<p>if dsymutil is able to find a full
definition then it would remove all
other definitions(which matched by
name) and set all references to that
found definition. If it is not able
to find a full definition then it
would do nothing. i.e. if there are
two incomplete
definitions(DW_AT_declaration
(true)) with the same name then they
would not be merged. That is a
possible improvement - to teach
dsymutil to merge incomplete types.<br>
</p>
</div>
</blockquote>
<div>Huh, what does it do with extra
member function definitions found in
later definitions? (eg: struct x {
template<typename T> void f(); };
- in one translation unit
x::f<int> is instantiated, in
another x::f<float> is
instantiated - how are the two
represented with dsymutil?) <br>
</div>
</div>
</div>
</blockquote>
<p>They would be considered as two not matched
types. dsymutil would not merge them somehow
and thus would not use single type
description. There would be two separate types
called "x" which would have mostly matched
members but differ with x::f<int> and
x::f<float>. No any de-duplication in
that case.</p>
</div>
</blockquote>
<div>Oh, that's unfortunate. It'd be nice for C++ at
least, to implement a potentially faster dsymutil
mode that could get this right and not have to
actually check for type equivalence, instead
relying on the name of the type to determine that
it must be identical.<br>
</div>
</div>
</div>
</blockquote>
<p>Right. That would result in even more size reduction.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
The first instance of the type that's encountered
has its fully qualified name or mangled name
recorded in a map pointing to the DIE. Any future
instance gets downgraded to a declaration, and
/certain/ members get dropped, but other members
get stuck on the declaration (same sort of DWARF
you see with "struct foo { virtual void f1();
template<typename T> void f2() { } }; void
test(foo& f) { f.f2<int>(); }").
Recording all the member functions of the
type/static member variable types might be needed
in cases where some member functions are defined
in one translation unit and some defined in
another - though I guess that infrastructure is
already in place/that just works today.<br>
</div>
</div>
</div>
</blockquote>
My understanding, is that there is not such infrastructure
currently. Current infrastructure allows to reference
single existing type declaration(canonical) from other
units. It does not allow to reference different parts(in
different units) of incomplete type.<br>
</div>
</blockquote>
<div><br>
Huh, so what does the DWARF look like when you define one
member function in one file, and another member function
(common with inline functions) in another file?<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>I think it would be necessary to change the order of
how compilation units are processed to implement such
types merging. </div>
</blockquote>
<div><br>
Oh, I wasn't suggesting merging them - or didn't mean to
suggest that. I meant doing something like what we do in
LLVM for type homed (no-standalone) DWARF, where we attach
function declarations to type declarations, eg:<br>
<br>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">struct
x {</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><span> </span>void
f1();</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><span> </span>void
f2();</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><span> </span>template<typename
T></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><span> </span>static
void f3();</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">};</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">#ifdef
HOME</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">void
x::f1() {</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">}</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">#endif</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">#ifdef
AWAY</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">void
x::f2() {</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">}</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">#endif</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">#ifdef
TEMPL</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">template<typename
T></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">void
x::f3() {</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">}</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">template
void x::f3<int>();</span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">#endif<br>
<br>
Building "HOME" would show the DWARF I'd expect to see
the first time a type definition is encountered during
dsym.<br>
Building "AWAY" raises the question of - what does
dsymutil do with this DWARF? Does it deduplicate the
type, and make the definition of 'f2' point to the 'f2'
declaration in the original type described in the prior
CU defined in "HOME"? If it doesn't do that, it
could/that would be good to reduce the DWARF size.<br>
Building "TEMPL" would show the DWARF I'd expect to see
if a future use of that type definition was encountered
but the original/home definition had no declaration of
this function: we should then emit maybe an "extension"
to the type (could be a straight declaration, or maybe
some newer/weirder hybrid that points to the definition
with some attribute) & then inject the declaration
of the template/other new member into this extension
definition, etc.<br>
</span></p>
</div>
</div>
</div>
</blockquote>
Please check the reduced DWARF, generated by current dsymutil for
above example :<br>
<br>
0x0000000b: DW_TAG_compile_unit<br>
DW_AT_language (DW_LANG_C_plus_plus)<br>
DW_AT_name ("home.cpp")<br>
DW_AT_stmt_list (0x00000000)<br>
DW_AT_low_pc (0x0000000100000f80)<br>
DW_AT_high_pc (0x0000000100000f8b)<br>
<br>
0x0000002a: DW_TAG_structure_type<br>
DW_AT_name ("x")<br>
DW_AT_byte_size (0x01)<br>
<br>
0x00000033: DW_TAG_subprogram<br>
DW_AT_linkage_name ("_ZN1x2f1Ev")<br>
DW_AT_name ("f1")<br>
DW_AT_type (0x000000000000005e "int")<br>
DW_AT_declaration (true)<br>
DW_AT_external (true)<br>
DW_AT_APPLE_optimized (true)<br>
<br>
0x00000047: NULL<br>
<br>
0x00000048: DW_TAG_subprogram<br>
DW_AT_linkage_name ("_ZN1x2f2Ev")<br>
DW_AT_name ("f2")<br>
DW_AT_type (0x000000000000005e "int")<br>
DW_AT_declaration (true)<br>
DW_AT_external (true)<br>
DW_AT_APPLE_optimized (true)<br>
<br>
0x0000005c: NULL<br>
0x0000005d: NULL<br>
<br>
0x0000006a: DW_TAG_subprogram<br>
DW_AT_low_pc (0x0000000100000f80)<br>
DW_AT_high_pc (0x0000000100000f8b)<br>
DW_AT_specification (0x0000000000000033
"_ZN1x2f1Ev") <br>
<br>
<br>
0x000000a0: DW_TAG_compile_unit<br>
DW_AT_language (DW_LANG_C_plus_plus)<br>
DW_AT_name ("away.cpp")<br>
DW_AT_stmt_list (0x00000048)<br>
DW_AT_low_pc (0x0000000100000f90)<br>
DW_AT_high_pc (0x0000000100000f9b)<br>
<br>
0x000000c6: DW_TAG_subprogram<br>
DW_AT_low_pc (0x0000000100000f90)<br>
DW_AT_high_pc (0x0000000100000f9b)<br>
DW_AT_specification (0x0000000000000048
"_ZN1x2f2Ev") <br>
<br>
0x000000fc: DW_TAG_compile_unit<br>
DW_AT_language (DW_LANG_C_plus_plus)<br>
DW_AT_name ("templ.cpp")<br>
DW_AT_stmt_list (0x00000090)<br>
DW_AT_low_pc (0x0000000100000fa0)<br>
DW_AT_high_pc (0x0000000100000fab)<br>
<br>
0x0000011b: DW_TAG_structure_type<br>
DW_AT_name ("x")<br>
DW_AT_byte_size (0x01)<br>
<br>
0x00000124: DW_TAG_subprogram<br>
DW_AT_linkage_name ("_ZN1x2f1Ev")<br>
DW_AT_name ("f1")<br>
DW_AT_type (0x0000000000000168 "int")<br>
DW_AT_declaration (true)<br>
DW_AT_external (true)<br>
DW_AT_APPLE_optimized (true)<br>
0x00000138: NULL<br>
<br>
0x00000139: DW_TAG_subprogram<br>
DW_AT_linkage_name ("_ZN1x2f2Ev")<br>
DW_AT_name ("f2")<br>
DW_AT_type (0x0000000000000168 "int")<br>
DW_AT_declaration (true)<br>
DW_AT_external (true)<br>
DW_AT_APPLE_optimized (true)<br>
0x0000014d: NULL<br>
<br>
0x0000014e: DW_TAG_subprogram<br>
DW_AT_linkage_name ("_ZN1x2f3IiEEiv")<br>
DW_AT_name ("f3<int>")<br>
DW_AT_type (0x0000000000000168 "int")<br>
DW_AT_declaration (true)<br>
DW_AT_external (true)<br>
DW_AT_APPLE_optimized (true)<br>
0x00000166: NULL<br>
0x00000167: NULL<br>
<br>
0x00000174: DW_TAG_subprogram<br>
DW_AT_low_pc (0x0000000100000fa0)<br>
DW_AT_high_pc (0x0000000100000fab)<br>
DW_AT_specification (0x000000000000014e
"_ZN1x2f3IiEEiv")<br>
0x00000190: NULL<br>
<br>
<br>
>Building "HOME" would show the DWARF I'd expect to see the first
time a type definition is encountered during dsym.<br>
<br>
compile unit "home.cpp" contains the type definition(0x0000002a) and
reference to its member(DW_AT_specification (0x0000000000000033
"_ZN1x2f1Ev")).<br>
<br>
>Building "AWAY" raises the question of - what does dsymutil do
with this DWARF? Does it deduplicate the type, and make the
definition of 'f2' point to the 'f2' declaration in the original
type described in the prior CU defined in "HOME"? If it doesn't do
that, it could/that would be good to reduce the DWARF size.<br>
<br>
compile unit "away.cpp" does not contain type definition and
contains reference to type definition from compile unit "home.cpp"
(DW_AT_specification (0x0000000000000048 "_ZN1x2f2Ev")).<br>
i.e. dsymutil deduplicates the type and makes the definition of 'f2'
point to the 'f2' declaration in the original type described in the
prior CU "home.cpp".<br>
<br>
>Building "TEMPL" would show the DWARF I'd expect to see if a
future use of that type definition was encountered but the
original/home definition had no declaration of this function: we
should then emit maybe an "extension" to the type (could be a
straight declaration, or maybe some newer/weirder hybrid that points
to the definition with some attribute) & then inject the
declaration of the template/other new member into this extension
definition, etc.<br>
<br>
compile unit "templ.cpp" contains the type definition(0x0000011b)
which matches with (0x0000002a) plus defines the new member
0x0000014e.<br>
It also references this new member by DW_AT_specification
(0x000000000000014e "_ZN1x2f3IiEEiv"). In this case type description
is not de-duplicated.<br></div></blockquote><div><br></div><div>Ah, yeah - that seems like a missed opportunity - duplicating the whole type DIE. LTO does this by making monolithic types - merging all the members from different definitions of the same type into one, but that's maybe too expensive for dsymutil (might still be interesting to know how much more expensive, etc). But I think the other way to go would be to produce a declaration of the type, with the relevant members - and let the DWARF consumer identify this declaration as matching up with the earlier definition. That's the sort of DWARF you get from the non-MachO default -fno-standalone-debug anyway, so it's already pretty well tested/supported (support in lldb's a bit younger/more work-in-progress, admittedly). I wonder how much dsym size there is that could be reduced by such an implementation.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<br>
Do you suggest that 0x0000011b should be transformed into something
like that:<br>
<br>
0x000000fc: DW_TAG_compile_unit<br>
DW_AT_language (DW_LANG_C_plus_plus)<br>
DW_AT_name ("templ.cpp")<br>
DW_AT_stmt_list (0x00000090)<br>
DW_AT_low_pc (0x0000000100000fa0)<br>
DW_AT_high_pc (0x0000000100000fab)<br>
<br>
0x0000011b: DW_TAG_structure_type<br>
DW_AT_specification (0x0000002a "x")<br>
<br>
0x00000124: DW_TAG_subprogram<br>
DW_AT_linkage_name ("_ZN1x2f3IiEEiv")<br>
DW_AT_name ("f3<int>")<br>
DW_AT_type (0x000000000000005e "int")<br>
DW_AT_declaration (true)<br>
DW_AT_external (true)<br>
DW_AT_APPLE_optimized (true)<br>
0x00000138: NULL<br>
0x00000139: NULL<br>
<br>
0x00000140: DW_TAG_subprogram<br>
DW_AT_low_pc (0x0000000100000fa0)<br>
DW_AT_high_pc (0x0000000100000fab)<br>
DW_AT_specification (0x0000000000000124
"_ZN1x2f3IiEEiv")<br>
0x00000155: NULL<br>
<br>
Did I correctly get the idea?<br></div></blockquote><div><br></div><div>Yep, more or less. It'd be "safer" if 11b didn't use DW_AT_specification to refer to 2a, but instead was only a completely independent declaration of "x" - that path is already well supported/tested (well, it's the work-in-progress stuff for lldb to support -fno-standalone-debug, but gdb's been consuming DWARF like this for years, Clang and GCC both produce DWARF like this (if the type is "homed" in another file, then Clang/GCC produce DWARF that emits a declaration with just the members needed to define any member functions defined/inlined/referenced in this CU)) for years.<br><br>But using DW_AT_specification, or maybe some other extension attribute might make the consumers task a bit easier (could do both - use an extension attribute to tie them up, leave DW_AT_declaration/DW_AT_name here for consumers that don't understand the extension attribute) in finding that they're all the same type/pieces of teh same type.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<p><br>
</p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>Currently, after the compilation unit is
analyzed(scanned for types and dead info) it started to be
emitted.<br>
It looks like, to support merging, it would be necessary
to analyze all CUs first(to create canonical
representation) and then start to emit them. <br>
<br>
I am going to start to work on a prototype of parallel
per-compilation unit implementation of DWARFLinker. <br>
(basing on the scenario which Jonas described in other
letter in that thread).<br>
The types merging could be the next step...<br>
<br>
Following is the result of compilation of your example on
darwin(showing that dsymutil does not merge such types):<br>
</div>
</blockquote>
<div><br>
Ah, yeah, that is unfortunate - so if there were other
members of "x" they would be duplicated in this case, right?<br>
<br>
This is a pretty common issue in C++ - there are 3 reasons I
know of where LLVM would produce distinct descriptions:<br>
1) member function templates, like this<br>
2) member/nested types<br>
3) implicit special members (not present unless instantiated
- so if you copy construct an object in one file and not in
another, two different types)<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> <br>
$ cat struct.h<br>
<br>
#ifndef MY_H<br>
#define MY_H<br>
<br>
struct foo {<br>
template <class T> int fff () { return sizeof(T);
} <br>
};<br>
<br>
#endif // MY_H<br>
<br>
$ cat mod1.cpp <br>
<br>
#include "struct.h"<br>
int test1 ( ) {<br>
foo var;<br>
return var.fff<int>();<br>
}<br>
<br>
$ cat mod2.cpp <br>
<br>
#include "struct.h"<br>
int test2 ( ) {<br>
foo var;<br>
return var.fff<float>();<br>
}<br>
<br>
$ cat main.cpp <br>
<br>
#include "struct.h"<br>
int test1();<br>
int test2();<br>
int main ( void ) {<br>
test1();<br>
test2();<br>
return 0;<br>
} <br>
<br>
$ clang++ main.cpp mod1.cpp mod2.cpp -O -g -fno-inline<br>
<br>
$ llvm-dwarfdump -a
a.out.dSYM/Contents/Resources/DWARF/a.out | less<br>
<br>
0x00000056: DW_TAG_compile_unit<br>
<br>
DW_AT_language (DW_LANG_C_plus_plus)<br>
DW_AT_name ("mod1.cpp")<br>
<br>
0x000000ae: DW_TAG_structure_type
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<br>
<br>
DW_AT_name ("foo")<br>
DW_AT_byte_size (0x01)<br>
<br>
0x000000b7: DW_TAG_subprogram<br>
<br>
DW_AT_linkage_name
("_ZN3foo3fffIiEEiv")<br>
DW_AT_name ("fff<int>")<br>
<br>
<br>
0x0000011f: DW_TAG_compile_unit<br>
<br>
DW_AT_language (DW_LANG_C_plus_plus)<br>
DW_AT_name ("mod2.cpp")<br>
<br>
0x00000177: DW_TAG_structure_type
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<br>
<br>
DW_AT_name ("foo")<br>
DW_AT_byte_size (0x01)<br>
<br>
0x00000180: DW_TAG_subprogram<br>
<br>
DW_AT_linkage_name
("_ZN3foo3fffIfEEiv")<br>
DW_AT_name ("fff<float>")<br>
<br>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
- Dave</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>Alexey.<br>
</p>
<blockquote type="cite">
<div dir="ltr"><br>
But I don't have super strong
feelings about the naming.</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Tue, Sep 1, 2020 at 6:36 AM
Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 01.09.2020 06:27,
David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">A quick note:
The feature as currently
proposed sounds like it's
an exact match for 'dwz'?
Is there any benefit to
this over the existing dwz
project? Is it different
in some ways I'm not aware
of? (I haven't actually
used dwz, so I might have
some mistaken ideas about
how it should work)<br>
<br>
If it's going to solve the
same general problem, but
be in the llvm project
instead, then maybe it
should be called llvm-dwz.<br>
</div>
</blockquote>
It looks like dwz and
llvm-dwarfutil are not exactly
matched in functionality. <br>
<br>
dwz is a program that
attempts to optimize DWARF
debugging information <br>
contained in ELF shared
libraries and ELF executables
for *size*.<br>
<br>
llvm-dwarfutil is a tool that
is used for processing debug<br>
info(DWARF) located in built
binary files to improve debug
info *quality*,<br>
reduce debug info *size* and
accelerate debug info
*processing*.<br>
<br>
Things which are supposed to
be done by llvm-dwarfutil and
which are not <br>
done by dwz: removing obsolete
debug info, building indexes,
stripping <br>
unneeded debug sections,
compress/decompress debug
sections.<br>
<br>
Common thing is that both of
these tools do debug info size
reduction. <br>
But they do this using
different approaches:<br>
<br>
1. dwz reduces the size of
debug info by creating partial
compilation units <br>
for duplicated parts. So
that these partial compilation
units could be imported <br>
in every duplicated place.
AFAIU, That optimization gives
the most size saving effect.<br>
<br>
another size saving
optimization is ODR types
deduplication.<br>
<br>
2. llvm-dwarfutil reduces the
size of debug info by ODR
types deduplication <br>
which gives the most size
saving effect in
llvm-dwarfutil case. <br>
<br>
another size saving
optimization is removing
obsolete debug info.<br>
(which actually is not only
about size but about
correctness also)<br>
<br>
So, it looks like these tools
are not equal. If we would
consider that <br>
llvm-dwz is an extension of
classic dwz then we could
probably<br>
name it as llvm-dwz.<br>
<br>
<blockquote type="cite">
<div dir="ltr"><br>
Though I understand the
desire for this to grow
other functionality, like
DWARF-aware dwp-ing. Might
be better for this to
busybox and provide that
functionality under
llvm-dwp instead, or more
likely I Suspect, that the
existing llvm-dwp will be
rewritten (probably by me)
to use more of lld's
infrastructure to be more
efficient (it's current
object reading/writing
logic is using LLVM's
libObject and MCStreamer,
which is a bit inefficient
for a very content-unaware
linking process) and then
maybe that could be taught
to use DwarfLinker as a
library to optionally do
DWARF-aware linking
depending on the users
time/space tradeoff
desires. Still benefiting
from any improvements to
the underlying DwarfLinker
library (at which point
that would be shared
between llvm-dsymutil,
llvm-dwz, and llvm-dwp).</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Tue, Aug 25, 2020 at
7:29 AM Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
We propose
llvm-dwarfutil - a
dsymutil-like tool for
ELF.<br>
Any thoughts on this?<br>
Thanks in advance,
Alexey.<br>
<br>
======================================================================<br>
<br>
llvm-dwarfutil(Apndx A)
- is a tool that is used
for processing debug <br>
info(DWARF)<br>
located in built binary
files to improve debug
info quality,<br>
reduce debug info size
and accelerate debug
info processing.<br>
Supported object files
formats: ELF,
MachO(Apndx B),
COFF(Apndx C), <br>
WASM(Apndx C).<br>
<br>
======================================================================<br>
<br>
Specifically, the tool
would do:<br>
<br>
- Remove obsolete
debug info which refers
to code deleted by the
linker<br>
doing the garbage
collection
(gc-sections).<br>
<br>
- Deduplicate debug
type definitions for
reducing resulting size
of <br>
binary.<br>
<br>
- Build
accelerator/index
tables.<br>
= .debug_aranges,
.debug_names,
.gdb_index,
.debug_pubnames, <br>
.debug_pubtypes.<br>
<br>
- Strip unneeded
tables.<br>
= .debug_aranges,
.debug_names,
.gdb_index,
.debug_pubnames, <br>
.debug_pubtypes.<br>
<br>
- Compress or
decompress debug info as
requested.<br>
<br>
Possible feature:<br>
<br>
- Join split dwarf
.dwo files in a single
file containing all
debug info<br>
(convert split
DWARF into monolithic
DWARF).<br>
<br>
======================================================================<br>
<br>
User interface:<br>
<br>
OVERVIEW: A tool for
optimizing debug info
located in the built
binary.<br>
<br>
USAGE: llvm-dwarfutil
[options] input output<br>
<br>
OPTIONS: (Apndx E)<br>
<br>
======================================================================<br>
<br>
Implementation notes:<br>
<br>
1. Removing obsolete
debug info would be done
using DWARFLinker llvm <br>
library.<br>
<br>
2. Data types
deduplication would be
done using DWARFLinker
llvm library.<br>
<br>
3. Accelerator/index
tables would be
generated using
DWARFLinker llvm <br>
library.<br>
<br>
4. Interface of
DWARFLinker library
would be changed in such
way that it<br>
would be possible to
switch on/off various
stages:<br>
<br>
class DWARFLinker {<br>
setDoRemoveObsoleteInfo
( bool
DoRemoveObsoleteInfo =
false);<br>
<br>
setDoAppleNames (
bool DoAppleNames =
false );<br>
setDoAppleNamespaces (
bool DoAppleNamespaces =
false );<br>
setDoAppleTypes (
bool DoAppleTypes =
false );<br>
setDoObjC ( bool
DoObjC = false );<br>
setDoDebugPubNames
( bool DoDebugPubNames =
false );<br>
setDoDebugPubTypes
( bool DoDebugPubTypes =
false );<br>
<br>
setDoDebugNames
(bool DoDebugNames =
false);<br>
setDoGDBIndex (bool
DoGDBIndex = false);<br>
}<br>
<br>
5. Copying source file
contents, stripping
tables, <br>
compressing/decompressing tables<br>
would be done by
ObjCopy llvm
library(extracted from
llvm-objcopy):<br>
<br>
Error
executeObjcopyOnBinary(const
CopyConfig &Config,<br>
object::COFFObjectFile &In, Buffer
&Out);<br>
Error
executeObjcopyOnBinary(const
CopyConfig &Config,<br>
object::ELFObjectFileBase &In, Buffer
&Out);<br>
Error
executeObjcopyOnBinary(const
CopyConfig &Config,<br>
object::MachOObjectFile &In, Buffer
&Out);<br>
Error
executeObjcopyOnBinary(const
CopyConfig &Config,<br>
object::WasmObjectFile &In, Buffer
&Out);<br>
<br>
6. Address ranges and
single addresses
pointing to removed code
should <br>
be marked<br>
with tombstone value
in the input file:<br>
<br>
-2 for .debug_ranges
and .debug_loc.<br>
-1 for other .debug*
tables.<br>
<br>
7. Prototype
implementation - <a href="https://reviews.llvm.org/D86539" rel="noreferrer" target="_blank">https://reviews.llvm.org/D86539</a>.<br>
<br>
======================================================================<br>
<br>
Roadmap:<br>
<br>
1. Refactor llvm-objcopy
to extract it`s
implementation into
separate <br>
library<br>
ObjCopy(in LLVM
tree).<br>
<br>
2. Create a command line
utility using existed
DWARFLinker and ObjCopy<br>
implementation.
First version is
supposed to work with
only ELF <br>
input object files.<br>
It would take input
ELF file with
unoptimized debug info
and create <br>
output<br>
ELF file with
optimized debug info.
That version would be
done out <br>
of the llvm tree.<br>
<br>
3. Make a tool to be
able to work in
multi-thread mode.<br>
<br>
4. Consider it to be
included into LLVM tree.<br>
<br>
5. Support DWARF5
tables.<br>
<br>
======================================================================<br>
<br>
Appendix A. Should this
tool be implemented as a
new tool or as an
extension<br>
to
dsymutil/llvm-objcopy?<br>
<br>
There already exists
a tool which removes
obsolete debug info on <br>
darwin - dsymutil.<br>
Why create another
tool instead of
extending the already
existed <br>
dsymutil/llvm-objcopy?<br>
<br>
The main
functionality of
dsymutil is located in a
separate library <br>
- DWARFLinker.<br>
Thus, dsymutil
utility is a
command-line interface
for DWARFLinker. <br>
dsymutil has<br>
another type of
input/output data: it
takes several object
files and <br>
address map<br>
as input and creates
a .dSYM bundle with
linked debug info as <br>
output. llvm-dwarfutil<br>
would take a built
executable as input and
create an optimized <br>
executable as output.<br>
Additionally, there
would be many
command-line options
specific for <br>
only one utility.<br>
This means that
these
utilities(implementing
command line interface)
<br>
would significantly<br>
differ. It makes
sense not to put another
command-line utility <br>
inside existing
dsymutil,<br>
but make it as a
separate utility. That
is the reason why <br>
llvm-dwarfutil suggested
to be<br>
implemented not as
sub-part of dsymutil but
as a separate tool.<br>
<br>
Please share your
preference: whether
llvm-dwarfutil should be<br>
separate utility, or
a variant of dsymutil
compiled for ELF?<br>
<br>
======================================================================<br>
<br>
Appendix B. The machO
object file format is
already supported by
dsymutil.<br>
Depending on the
decision whether
llvm-dwarfutil would be
done as a <br>
subproject<br>
of dsymutil or as a
separate utility - machO
would be supported or
not.<br>
<br>
======================================================================<br>
<br>
Appendix C. Support for
the COFF and WASM object
file formats presented
as<br>
possible future
improvement. It would be
quite easy to add them <br>
assuming<br>
that llvm-objcopy
already supports these
formats. It also would
require<br>
supporting
DWARF6-suggested
tombstone values(-1/-2).<br>
<br>
======================================================================<br>
<br>
Appendix D.
Documentation.<br>
<br>
- proposal for DWARF6
which suggested -1/-2
values for marking bad <br>
addresses<br>
<a href="http://www.dwarfstd.org/ShowIssue.php?issue=200609.1" rel="noreferrer" target="_blank">http://www.dwarfstd.org/ShowIssue.php?issue=200609.1</a><br>
- dsymutil tool <a href="https://llvm.org/docs/CommandGuide/dsymutil.html" rel="noreferrer" target="_blank">https://llvm.org/docs/CommandGuide/dsymutil.html</a>.<br>
- proposal "Remove
obsolete debug info in
lld."<br>
<a href="http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html" rel="noreferrer" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html</a><br>
<br>
======================================================================<br>
<br>
Appendix E. Possible
command line options:<br>
<br>
DwarfUtil Options:<br>
<br>
--build-aranges
- generate
.debug_aranges table.<br>
--build-debug-names
- generate .debug_names
table.<br>
--build-debug-pubnames
- generate
.debug_pubnames table.<br>
--build-debug-pubtypes
- generate
.debug_pubtypes table.<br>
--build-gdb-index
- generate .gdb_index
table.<br>
--compress
- Compress debug tables.<br>
--decompress
- Decompress debug
tables.<br>
--deduplicate-types
- Do ODR deduplication
for debug types.<br>
--garbage-collect
- Do garbage collecting
for debug info.<br>
--num-threads=<n>
- Specify the maximum
number (n) of <br>
simultaneous threads<br>
to use when optimizing input file.<br>
Defaults to the number of cores on the <br>
current machine.<br>
--strip-all
- Strip all debug
tables.<br>
--strip=<name1,name2>
- Strip specified debug
info tables.<br>
--strip-unoptimized-debug
- Strip all unoptimized
debug tables.<br>
--tombstone=<value>
- Tombstone value used
as a marker of <br>
invalid address.<br>
=bfd
- BFD default value<br>
=dwarf6
- Dwarf v6.<br>
--verbose
- Enable verbose logging
and encoding details.<br>
<br>
Generic Options:<br>
<br>
--help
- Display available
options (--help-hidden <br>
for more)<br>
--version
- Display the version of
this program<br>
<br>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote></div></div>