<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Jonas, <br>
<br>
Thank you for the comments, please find my answers below...<br>
</p>
<div class="moz-cite-prefix">On 06.08.2020 20:39, Jonas Devlieghere
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<div>Hi Alexey,</div>
<div><br>
</div>
<div>I should've looked at this earlier. I went through the
thread again and I've</div>
<div>made some comments, mostly from the dsymutil point of
view.</div>
<div><br>
</div>
<div>> Current DWARFEmitter/DWARFStreamer has an
implementation for DWARF</div>
<div>> generation, which does not support DWARF5(only
debug_names table). At the</div>
<div>> same time, there already exists code in
CodeGen/AsmPrinter/DwarfDebug.h,</div>
<div>> which implements most of DWARF5. It seems that
DWARFEmitter/DWARFStreamer</div>
<div>> should be rewritten using DwarfDebug/DwarfFile.
Though I am not sure</div>
<div>> whether it would be easy to re-use
DwarfDebug/DwarfFile. It would probably</div>
<div>> be necessary to separate some intermediate level of
DwarfDebug/DwarfFile.</div>
<div><br>
</div>
<div>These classes serve very different purposes. Last time I
looked at them there</div>
<div>was very little overlap in functionality. In the compiler
we're mostly</div>
<div>concerned with generating the DWARF, while in dsymutil we
try to copy</div>
<div>everything we don't need to parse, and fix up what we
have to. I don't want</div>
<div>to say it's not possible, but I think supporting DWARF5
in those classes is</div>
<div>going to be a lot less work than trying to reuse the
CodeGen variants.</div>
</div>
</div>
</blockquote>
I agree, in it`s current state it would be less work to write
separate implementation <br>
than reusing CodeGen variants. The bad thing is that in such a case
there is a lot of <br>
code duplication:<br>
<br>
DwarfStreamer::emitUnitRangesEntries<br>
DwarfDebug::emitDebugARanges<br>
EmitGenDwarfAranges<br>
DWARFYAML::emitDebugAranges<br>
<br>
Supporting new standard would require rewriting/modification of all
these places. In the ideal world,<br>
having single implementation for the DWARF generation allows
changing one place and having <br>
benefits in others. Probably, CodeGen classes could be rewritten and
then it would be useful<br>
to write them assuming two use cases - generation from the scratch
and copying/updating <br>
existing data. In the end, there would be single implementation
which could be reused in <br>
many places. Though, it is indeed a lot of work.<br>
<br>
<blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>> Measurements show that it is spent ~10 sec in</div>
<div>> llvm::StringMapImpl::LookupBucketFor(). The problem
is that the same</div>
<div>> strings, again and again, are added to the string
pool. Two attributes</div>
<div>> having the same string value would be analyzed (hash
calculated) and</div>
<div>> searched inside the string pool. Even if these
strings are already in</div>
<div>> string table(DW_FORM_strp, DW_FORM_strx). The
process could be optimized</div>
<div>> for string tables. So that if some string from the
string table were</div>
<div>> accessed previously then, it would keep a reference
into the string pool.</div>
<div>> This would eliminate a lot of string pool searches.</div>
<div><br>
</div>
<div>I'm not sure I fully understand the optimization, but I'd
love to speed this</div>
<div>up, if only for dsymutil's sake. I'd love to talk about
this in a separate</div>
<div>thread or offline.</div>
<div><br>
</div>
</div>
</div>
</blockquote>
The measurements show that quite a big time is taken <br>
by llvm::StringMapImpl::LookupBucketFor(). i.e. searching inside a
string <br>
pool takes a significant amount of time. The idea of optimization
was to <br>
reduce the number of string pool searches by remembering previous <br>
results. DW_FORM_strp, DW_FORM_strx forms do not keep string itself
<br>
but reference a string from a separate table by index. Currently. if
there are <br>
duplicated strings of DW_FORM_strp, DW_FORM_strx there would be <br>
two/three/...(one per duplicate) searches in string pool <br>
(llvm::StringMapImpl::LookupBucketFor() would be called). If the
position <br>
in the pool would be remembered for the index of the first duplicate
<br>
then there would not be necessary to call
llvm::StringMapImpl::LookupBucketFor() next times.<br>
<br>
But prototyping of that idea did not show any worthful performance
improvement. <br>
<br>
Some small performance improvement could be achieved if string pools
would use <br>
llvm::hash_value(StringRef S) instead of llvm::djbHash().
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div>> Currently, all object files are analyzed
sequentially and cloned</div>
<div>> sequentially. Cloning is started in parallel with
analyzing. That scheme</div>
<div>> could be changed: analyzing and cloning could be
done in parallel for each</div>
<div>> object file. That requires refactoring of
DWARFLinker and making string</div>
<div>> pools and DeclContextTree thread-safe.</div>
<div><br>
</div>
<div>I'm less familiar with the way that LLD uses the
DWARFOptimizer but this is</div>
<div>not possible for dsymutil as it is trying to deduplicate
DIEs from different</div>
<div>compile units.</div>
</div>
</div>
</blockquote>
Right. dsymutil is trying to de-duplicate DIEs from different<br>
compile units. That, probably, does not avoid multi-thread
implementation: <br>
<br>
1. DeclContextTree.getChildDeclContext() should be done thread safe.<br>
thus, even if CU would be processed in parallel - DIEs could be
de-duplicated<br>
based on DeclContext. <br>
2. UniquingStringPool and OffsetsStringPool should also be done
thread safe.<br>
3. Since compilation units would be processed in parallel -<br>
the size of the compilation unit would not be known until it is
fully processed. <br>
That means that all compilation unit's references should be
patched after <br>
CU content is generated. In the same manner like forward
references <br>
are currently patched(fixupForwardReferences).<br>
4. DWARFStreamer provides a sequential interface. Instead of a
single stream <br>
as the output, there could be generated several outputs for each
CU. <br>
They would be glued together in the end.<br>
<blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>> I think improving dsymutil is a valuable thing.
Though there are several</div>
<div>> directions which might be considered to make it more
robust:</div>
<div>></div>
<div>> 1. support of latest DWARF - DWARF5/DWARF64...</div>
<div><br>
</div>
<div>Strong +1 on DWARF5. I haven't had the bandwidth yet to
really look at this.</div>
<div>Right now we can't find (at least some) rellocations so
we bail out. I'd need</div>
<div>to fix that to assess the current state of things and
figure out how much</div>
<div>work would be needed.</div>
<div><br>
</div>
<div>I don't think anything in LLVM supports generating
DWARF64 though.</div>
<div><br>
</div>
<div>> 2. implement multi-threaded execution.</div>
<div><br>
</div>
<div>See my earlier comment. At least for the dsymutil case,
the current approach</div>
<div>is the best we can do, but I'd love to be proven wrong.
:-)</div>
<div><br>
</div>
<div>> 3. support of split DWARF.</div>
<div>> 4. implement dsymutil for non-darwin platform.</div>
<div><br>
</div>
<div>These two seem to go together. Given the work you did to
split off the DWARF</div>
<div>optimization part I think we're closer to this than ever.
Thanks again for</div>
<div>doing that.</div>
<div><br>
</div>
<div>> We considered three options:</div>
<div>></div>
<div>> 1. add new functionality into dsymutil. So that
dsymutil behaves</div>
<div>> differently on a non-darwin platform and supports
another set of</div>
<div>> command-line options.</div>
<div>></div>
<div>> 2. add new functionality into llvm-objcopy.
llvm-objcopy already supports</div>
<div>> various binary objects formats(MachO,ELF,COFF,wasm).
It also has several</div>
<div>> options to work with debug-info.</div>
<div>></div>
<div>> 3. create new utility llvm-dwarfutil which would
implement the above</div>
<div>> functionality and reuse DWARFLinker(extracted from
dsymutil) library and</div>
<div>> new library ObjectCopy(extracted from llvm-objcopy).</div>
<div>></div>
<div>> So far our preference is number three. The reason
for this is that separate</div>
<div>> utility specifically working with debug info looks
as good separation of</div>
<div>> concepts. Adding another behavior to dsymutil looks
not very good.</div>
<div><br>
</div>
<div>In its current state dsymutil itself is a pretty small
tool on top of the</div>
<div>DWARFOptimizer/Linker. I'm curious what the benefits of
another tool are</div>
<div>compared to a different frontend (like objcopy) for MachO
and ELF. It seems</div>
<div>like that would allow for separation of concerns, while
still being able to</div>
<div>share common code without having to push it all the way
up into LLVM.</div>
</div>
</div>
</blockquote>
my concern is that this tool would have different source data and
different set of options.<br>
Having in mind that handling different set of input data and
different set of options <br>
means writing the other frontend - it, probably, would be good not
to make dsymutil more complex but<br>
to create another small tool. But, If extending dsymutil looks OK -
I am OK with it. <br>
Let`s discuss this approach within proposal thread.
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>> Extending the already rich interface of llvm-objcopy
looks also not very</div>
<div>> good. Having in mind that actual implementation
would be shared by</div>
<div>> libraries, the separate utility, working
specifically with debug info,</div>
<div>> looks like the right choice. That is our current
idea.</div>
<div><br>
</div>
<div>> My personal thought would be that extending dsymutil
should be ok as the</div>
<div>> functionality goes well with everything else
dsymutil does (other than not</div>
<div>> support ELF which the dsymutil maintainers are on
board with last I</div>
<div>> checked). That said, I definitely think a write-up
will be helpful. No</div>
<div>> matter what I support extracting all of the behavior
into libraries and</div>
<div>> using that somewhere :)</div>
<div><br>
</div>
<div>Ha, so basically what I was trying to say above.</div>
<div><br>
</div>
<div>I look forward to seeing the proposal!</div>
</div>
</div>
</blockquote>
<p>yep, would publish it soon.<br>
</p>
<p>Thank you, Alexey.<br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>Cheers,</div>
<div>Jonas</div>
<div><br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Aug 4, 2020 at 11:33
PM Eric Christopher <<a href="mailto:echristo@gmail.com"
moz-do-not-send="true">echristo@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">Hi Alexey,
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Aug 3, 2020 at
8:32 AM Alexey Lapshin <<a
href="mailto:avl.lapshin@gmail.com" target="_blank"
moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Eric, please <br>
</p>
<div>On 31.07.2020 22:02, Eric Christopher wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Alexey,</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Jul
31, 2020 at 4:02 AM Alexey Lapshin via
llvm-dev <<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>
On 28.07.2020 19:28, David Blaikie wrote:<br>
> On Tue, Jul 28, 2020 at 8:55 AM Alexey
Lapshin <<a
href="mailto:avl.lapshin@gmail.com"
target="_blank" moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
>><br>
>> On 28.07.2020 10:29, David Blaikie
via llvm-dev wrote:<br>
>>> On Fri, Jun 26, 2020 at 9:28 AM
Alexey Lapshin<br>
>>> <<a
href="mailto:alapshin@accesssoftek.com"
target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>
wrote:<br>
>>>>>>>>>>>> This idea goes in
another direction than fragmenting dwarf<br>
>>>>>>>>>>>> using elf
sections&tricks. It seems to me that the
cost of fragmenting is too high.<br>
>>>>>>>>>>> I
tend to agree - but I'm sort of leaning
towards trying to use object<br>
>>>>>>>>>>>
features as much as possible, then
implementing just enough custom<br>
>>>>>>>>>>>
handling in the linker to recoup overhead,
etc. (eg: add some kind of<br>
>>>>>>>>>>>
small header/brief description that makes it
easy for the linker to<br>
>>>>>>>>>>>
slice-and-dice - but hopefully a
domain-specific such header can be a<br>
>>>>>>>>>>>
bit more compact than the fully general ELF
form)<br>
>>>>>>>>>> I
think this indeed should be implemented and
evaluated.<br>
>>>>>>>>>> So
that various approaches could be compared.<br>
>>>>>>>>>><br>
>>>>>>>>>>>> It is not only the
sizes of structures describing fragments but
also the complexity<br>
>>>>>>>>>>>> of tools that should be
taught to work with fragmented DWARF.<br>
>>>>>>>>>>>> (f.e. llvm-dwarfdump
applied to object file should be able to read
fragmented DWARF,<br>
>>>>>>>>>>>> but applied to linked
executable it should work with non-fragmented
DWARF).<br>
>>>>>>>>>>>> That idea is for the
tool which works the same way as dsymutil ODR.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> I will shortly describe
the idea of making DWARF be easier processed
by dsymutil/DWARFLinker:<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> The idea is to have
only one "type table" per object file(special
section .debug_types_table).<br>
>>>>>>>>>>>> This "type table" would
contain all types.<br>
>>>>>>>>>>>> There could be a
special type of reference - type_offset - that
offset points into the type table.<br>
>>>>>>>>>>>> Basic types could
always be placed into the start of "type
table" thus, offsets to basic types<br>
>>>>>>>>>>>> most often would be 1
byte. There also would be a special kind of
reference - reference inside the type.<br>
>>>>>>>>>>>> Type units sig8 system
- would not be used to reference types.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> Types deduplication is
assumed to be done, not by linker mechanism
for COMDAT,<br>
>>>>>>>>>>>> but by a tool like
dsymutil. This tool would create resulting
.debug_types_table by putting there<br>
>>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy of the
type would be placed into the<br>
>>>>>>>>>>>> resulting table. All
references pointing to the deleted copy would
be corrected to point<br>
>>>>>>>>>>>> to the single copy
inside "type table". (that is how dsymutil
works currently)<br>
>>>>>>>>>>> ^
that's the step that's probably a bit
expensive for a general-use<br>
>>>>>>>>>>>
tool - it implies parsing all the DWARF to
find those references and<br>
>>>>>>>>>>>
rewrite them, I think. For a high-performance
solution that could be<br>
>>>>>>>>>>>
run by the linker I think it'd be necessary to
have a solution that<br>
>>>>>>>>>>>
doesn't involve parsing all the DIEs.<br>
>>>>>>>>>>
According to the current dsymutil processing,<br>
>>>>>>>>>>
exactly this process is not the most
time-consuming.<br>
>>>>>>>>>> That
could be done relatively fast.<br>
>>>>>>>>> Fair
enough - though I'd still imagine any solution
that involves<br>
>>>>>>>>> parsing
all the DIEs still wouldn't be fast enough
(maybe an order of<br>
>>>>>>>>> magnitude
faster than the current solution even - but
that's stuill,<br>
>>>>>>>>> what, 6
or 7x slower than linking without the
feature?) for most users<br>
>>>>>>>>> to
consider it a good trade-off.<br>
>>>>>>>> It seems to
me that even the current 6x-7x slowdown could
be useful.<br>
>>>>>>>> Users who
already use dsymutil or llvm-dwp(assuming
DWARFLinker<br>
>>>>>>>> would be
taught to work with a split dwarf) tools spend
this time and,<br>
>>>>>>>> in some
scenarios, waste disk space by inter-mediate
files.<br>
>>>>>>> FWIW, dwp
(llvm-dwp hasn't really been optimized
compared to binutils<br>
>>>>>>> dwp) is designed
to be very quick - by not needing to do a lot
of<br>
>>>>>>> parsing/fixups.
Which, yes, means larger output files than
would be<br>
>>>>>>> possible with
more parsing/etc. It also doesn't take any
input from<br>
>>>>>>> the linker (so it
can run in parallel with the linker) - so it
can't<br>
>>>>>>> remove dead
subprograms. Given Google's the major (perhaps
only<br>
>>>>>>> significant?)
user of Split DWARF - I can say that the needs
don't<br>
>>>>>>> necessarily
overlap well with something that would take
significantly<br>
>>>>>>> longer to run or
use significantly more memory.
Faster/cheaper/with<br>
>>>>>>> somewhat bigger
output files is probably the right tradeoff
for<br>
>>>>>>> Google's use
case, at least.<br>
>>>>>>><br>
>>>>>>> I imagine Apple's
use for dsymutil is somewhat similar - it's
not used<br>
>>>>>>> in the iterative
development cycle, only in final releases -
well,<br>
>>>>>>> maybe their
situation is more "neutral" - not a major pain
point in<br>
>>>>>>> any case I'd
guess.<br>
>>>>>>><br>
>>>>>>><br>
>>>>>> I see. FWIW,
Comparison splitdwarf+dwp and DWARFLinker from
lld:<br>
>>>>>><br>
>>>>>> 1.
split-dwarf+llvm-dwp = linking time for clang
6 sec,<br>
>>>>>> generating time
for .dwp 53 sec, clang=997M clang.dwp=1.1G.<br>
>>>>> FWIW, llvm-dwp is not
very well optimized (which is to say: it is
not<br>
>>>>> optimized), binutils dwp
might be a better comparison (& even that<br>
>>>>> doesn't have the
parallelism & some potential further
memory savings<br>
>>>>> that lld has that we
could take advantage of in a dwp-like tool)<br>
>>>>><br>
>>>>> What build mode was the
clang binary built in? Optimized or
unoptimized?<br>
>>>> right, that is unoptimized
build with -ffunction-sections.<br>
>>>><br>
>>>>>> 2. DWARFLinker from
lld = linking time for clang 72 sec,
clang=760M.<br>
>>> And this is without Split DWARF?
Without linker DWARF compression? -<br>
>>> that seems quite a bit
surprising, that the deduplication of DWARF<br>
>>> could fit into less space than
the wasted/reclaimed space in ranges (&<br>
>>> line)?<br>
>> that was without split dwarf, without
linker compression.<br>
>><br>
>>> Could you double check these
numbers & provide a clearer summary?<br>
>> sure, I would re-check it.<br>
>><br>
>>> Here's my attempt at numbers (all
with function-sections+gc-sections)...<br>
>>><br>
>>> Split DWARF tests didn't seem
meaningful - gc-debuginfo + split DWARF<br>
>>> seemed to drop all the debug info
(except gdb_index) so wasn't<br>
>>> working/comparison wasn't
meaningful for Apples to Apples, but<br>
>>> included it for comparing gc'd
non-split to non-gc'd split (disabled<br>
>>> gnu-pubnames/gdb-index
(-gsplit-dwarf -gno-gnu-pubnames) (which turns<br>
>>> on by default with Split DWARF
because gdb needs it - but a bit of an<br>
>>> unfair comparison without turning
on gnu-pubnames/gdb-index in other<br>
>>> build modes too, since it...
/shouldn't/ be necessary) which might've<br>
>>> been a factor in the data you
were looking at)<br>
>> that might be the case. i.e.
clang=997M for split dwarf(from my previous<br>
>> measurement) might include
gnu-pubnames.<br>
>><br>
>> would recheck it and if that is the
case then it is a unfair comparison.<br>
>><br>
>><br>
>> My point was that "DWARFLinker from
lld" takes less space than singleton<br>
>> split dwarf file+.dwp file.<br>
>><br>
>> for -O0 uncompressed:<br>
>><br>
>> - .dwp took 1.1G(if I built it
correctly), singleton clang(from your<br>
>> measurements) 566 MB<br>
>><br>
>> overall 1.6G.<br>
> Oh, yeah, even if there are some
measurement issues, linked executable<br>
> + .dwp is going to be larger than a
linked executable using non-split<br>
> DWARF (in v5), since v5 uses all the same
representations as non-split<br>
> DWARF, and split DWARF adds the
indirection overhead of a split file,<br>
> etc.<br>
><br>
> Even without DWARF linking, it's true
that split DWARF has overhead<br>
> (dwp+executable will be larger than
executable non-split).<br>
><br>
> But maybe we've ended up down a bit of a
tangent in any case.<br>
><br>
> Trying to bring this back to "should this
be committed to lld" seems<br>
> valuable, and I'm not sure what the right
criteria are for that.<br>
I think it would be useful to do "removing
obsolete debug info"<br>
in the linker. First thing is that it would be
the fastest way(no need<br>
to copy data/create temp files/built address
map...) Second thing<br>
is that it would be a good separation of
concepts. All debug info<br>
processing, currently done in the
linker(gdb_index, upcoming<br>
debug_names), could be moved into separate
library processing<br>
debug info. When gdb_index/debug_names should
be built without<br>
"removing of obsolete debug info" it would
have the same<br>
performance results as it currently has.<br>
<br>
We decided to give the idea of "removing of
obsolete debug info"<br>
another try and are going to implement it as a
separate utility<br>
working with built binary. Making it to be
multi-thread would<br>
probably show better performance results and
then it could<br>
probably be considered as acceptable to use
from the linker.<br>
<br>
</blockquote>
<div><br>
</div>
<div>I'm quite interested in this direction. One
thought I had was to incorporate such a
library into dsymutil but with support for
ELF. If you get a proposal written up I'd love
to take a look and comment.</div>
<div><br>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
yes, I would share the proposal in a separate thread
within a week or two.<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>Excellent, thanks :)</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div> Shortly: we decided to move in slightly other
direction than adding this functionality <br>
into dsymutil. Though if there is a preference to
implement it as part of dsymutil <br>
we are OK to do this way.<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>I have a vague preference since a lot of
functionality already exists there on one platform and
extending that seems straight forward, however...</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div> In its first version, this new utility supposed to
receive built binary with debug info <br>
as input(with the new marking for references to
removed code sections -1/-2 <br>
-<a href="https://reviews.llvm.org/D84825"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D84825</a>)
and create a new binary with removed obsolete <br>
debug info according to the above marking. In the next
versions, it could be extended <br>
with other debug info optimizations tasks. F.e.
generation new index tables, debug info <br>
optimizing... etc...<br>
<br>
We considered three options:<br>
<br>
1. add new functionality into dsymutil. So that
dsymutil behaves differently <br>
on a non-darwin platform and supports another set
of command-line options.<br>
<br>
2. add new functionality into llvm-objcopy.
llvm-objcopy already supports various <br>
binary objects formats(MachO,ELF,COFF,wasm). It
also has several options <br>
to work with debug-info.<br>
<br>
3. create new utility llvm-dwarfutil which would
implement the above functionality <br>
and reuse DWARFLinker(extracted from dsymutil)
library and new library <br>
ObjectCopy(extracted from llvm-objcopy).<br>
<br>
So far our preference is number three. The reason for
this is that separate <br>
utility specifically working with debug info looks as
good separation of concepts. <br>
Adding another behavior to dsymutil looks not very
good. Extending the already <br>
rich interface of llvm-objcopy looks also not very
good. Having in mind that actual <br>
implementation would be shared by libraries, the
separate utility, working specifically <br>
with debug info, looks like the right choice. That is
our current idea. <br>
<p>I would publish the proposal shortly to discuss it.<br>
</p>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>These are solid arguments - in particular, I agree
with not extending llvm-objcopy :)</div>
<div><br>
</div>
<div><a class="gmail_plusreply"
id="gmail-m_144640436407649066plusReplyChip-0"
href="mailto:jonas@devlieghere.com" target="_blank"
moz-do-not-send="true">+Jonas Devlieghere</a> and <a
class="gmail_plusreply"
id="gmail-m_144640436407649066plusReplyChip-1"
href="mailto:aprantl@apple.com" target="_blank"
moz-do-not-send="true">+Adrian Prantl</a> for dsymutil
comments.</div>
<div><br>
</div>
<div>My personal thought would be that extending dsymutil
should be ok as the functionality goes well with
everything else dsymutil does (other than not support
ELF which the dsymutil maintainers are on board with
last I checked). That said, I definitely think a
write-up will be helpful. No matter what I support
extracting all of the behavior into libraries and using
that somewhere :)</div>
<div><br>
</div>
<div>Thanks!</div>
<div><br>
</div>
<div>-eric</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div> Thank you, Alexey.<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>Thanks!</div>
<div><br>
</div>
<div>-eric</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
Alexey.<br>
<br>
><br>
> Ray's the best person to weigh in on
that. My 2c is that I think it<br>
> probably is worthwhile, even just as an
experiment, assuming it's not<br>
> too intrusive to lld.<br>
><br>
>> - The "DWARFLinker from lld" 820
MB(from your measurements).<br>
>><br>
>><br>
>> So "DWARFLinker from lld" looks two
times better.<br>
>><br>
>><br>
>> Anyway, thank you for pointing me to
possible mistake. I would recheck<br>
>> it and update results.<br>
>><br>
>><br>
>> Alexey.<br>
>><br>
>><br>
>>> * -O0: (baseline, just using
strip -g: 356 MB)<br>
>>> * compressed: 25% smaller
with gc-debuginfo (481 MB / 641 MB) (407<br>
>>> MB split/non-gc)<br>
>>> * uncompressed: 30% smaller
(820 MB / 1.2 GB) (566 MB split/non-gc)<br>
>>> * -O3: (baseline: 116 MB)<br>
>>> * compressed: 16% smaller
(361 MB / 462 MB) (283 MB split/non-gc)<br>
>>> * uncompressed: 22% smaller
(1022 MB / 1.2 GB) (156 MB split/non-gc)<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> On Fri, Jun 26, 2020 at 9:28 AM
Alexey Lapshin<br>
>>> <<a
href="mailto:alapshin@accesssoftek.com"
target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>
wrote:<br>
>>>>>>>>>>>> This idea goes in
another direction than fragmenting dwarf<br>
>>>>>>>>>>>> using elf
sections&tricks. It seems to me that the
cost of fragmenting is too high.<br>
>>>>>>>>>>> I
tend to agree - but I'm sort of leaning
towards trying to use object<br>
>>>>>>>>>>>
features as much as possible, then
implementing just enough custom<br>
>>>>>>>>>>>
handling in the linker to recoup overhead,
etc. (eg: add some kind of<br>
>>>>>>>>>>>
small header/brief description that makes it
easy for the linker to<br>
>>>>>>>>>>>
slice-and-dice - but hopefully a
domain-specific such header can be a<br>
>>>>>>>>>>>
bit more compact than the fully general ELF
form)<br>
>>>>>>>>>> I
think this indeed should be implemented and
evaluated.<br>
>>>>>>>>>> So
that various approaches could be compared.<br>
>>>>>>>>>><br>
>>>>>>>>>>>> It is not only the
sizes of structures describing fragments but
also the complexity<br>
>>>>>>>>>>>> of tools that should be
taught to work with fragmented DWARF.<br>
>>>>>>>>>>>> (f.e. llvm-dwarfdump
applied to object file should be able to read
fragmented DWARF,<br>
>>>>>>>>>>>> but applied to linked
executable it should work with non-fragmented
DWARF).<br>
>>>>>>>>>>>> That idea is for the
tool which works the same way as dsymutil ODR.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> I will shortly describe
the idea of making DWARF be easier processed
by dsymutil/DWARFLinker:<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> The idea is to have
only one "type table" per object file(special
section .debug_types_table).<br>
>>>>>>>>>>>> This "type table" would
contain all types.<br>
>>>>>>>>>>>> There could be a
special type of reference - type_offset - that
offset points into the type table.<br>
>>>>>>>>>>>> Basic types could
always be placed into the start of "type
table" thus, offsets to basic types<br>
>>>>>>>>>>>> most often would be 1
byte. There also would be a special kind of
reference - reference inside the type.<br>
>>>>>>>>>>>> Type units sig8 system
- would not be used to reference types.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> Types deduplication is
assumed to be done, not by linker mechanism
for COMDAT,<br>
>>>>>>>>>>>> but by a tool like
dsymutil. This tool would create resulting
.debug_types_table by putting there<br>
>>>>>>>>>>>> types from source
.debug_types_table-s. Only one copy of the
type would be placed into the<br>
>>>>>>>>>>>> resulting table. All
references pointing to the deleted copy would
be corrected to point<br>
>>>>>>>>>>>> to the single copy
inside "type table". (that is how dsymutil
works currently)<br>
>>>>>>>>>>> ^
that's the step that's probably a bit
expensive for a general-use<br>
>>>>>>>>>>>
tool - it implies parsing all the DWARF to
find those references and<br>
>>>>>>>>>>>
rewrite them, I think. For a high-performance
solution that could be<br>
>>>>>>>>>>>
run by the linker I think it'd be necessary to
have a solution that<br>
>>>>>>>>>>>
doesn't involve parsing all the DIEs.<br>
>>>>>>>>>>
According to the current dsymutil processing,<br>
>>>>>>>>>>
exactly this process is not the most
time-consuming.<br>
>>>>>>>>>> That
could be done relatively fast.<br>
>>>>>>>>> Fair
enough - though I'd still imagine any solution
that involves<br>
>>>>>>>>> parsing
all the DIEs still wouldn't be fast enough
(maybe an order of<br>
>>>>>>>>> magnitude
faster than the current solution even - but
that's stuill,<br>
>>>>>>>>> what, 6
or 7x slower than linking without the
feature?) for most users<br>
>>>>>>>>> to
consider it a good trade-off.<br>
>>>>>>>> It seems to
me that even the current 6x-7x slowdown could
be useful.<br>
>>>>>>>> Users who
already use dsymutil or llvm-dwp(assuming
DWARFLinker<br>
>>>>>>>> would be
taught to work with a split dwarf) tools spend
this time and,<br>
>>>>>>>> in some
scenarios, waste disk space by inter-mediate
files.<br>
>>>>>>> FWIW, dwp
(llvm-dwp hasn't really been optimized
compared to binutils<br>
>>>>>>> dwp) is designed
to be very quick - by not needing to do a lot
of<br>
>>>>>>> parsing/fixups.
Which, yes, means larger output files than
would be<br>
>>>>>>> possible with
more parsing/etc. It also doesn't take any
input from<br>
>>>>>>> the linker (so it
can run in parallel with the linker) - so it
can't<br>
>>>>>>> remove dead
subprograms. Given Google's the major (perhaps
only<br>
>>>>>>> significant?)
user of Split DWARF - I can say that the needs
don't<br>
>>>>>>> necessarily
overlap well with something that would take
significantly<br>
>>>>>>> longer to run or
use significantly more memory.
Faster/cheaper/with<br>
>>>>>>> somewhat bigger
output files is probably the right tradeoff
for<br>
>>>>>>> Google's use
case, at least.<br>
>>>>>>><br>
>>>>>>> I imagine Apple's
use for dsymutil is somewhat similar - it's
not used<br>
>>>>>>> in the iterative
development cycle, only in final releases -
well,<br>
>>>>>>> maybe their
situation is more "neutral" - not a major pain
point in<br>
>>>>>>> any case I'd
guess.<br>
>>>>>>><br>
>>>>>>><br>
>>>>>> I see. FWIW,
Comparison splitdwarf+dwp and DWARFLinker from
lld:<br>
>>>>>><br>
>>>>>> 1.
split-dwarf+llvm-dwp = linking time for clang
6 sec,<br>
>>>>>> generating time
for .dwp 53 sec, clang=997M clang.dwp=1.1G.<br>
>>>>> FWIW, llvm-dwp is not
very well optimized (which is to say: it is
not<br>
>>>>> optimized), binutils dwp
might be a better comparison (& even that<br>
>>>>> doesn't have the
parallelism & some potential further
memory savings<br>
>>>>> that lld has that we
could take advantage of in a dwp-like tool)<br>
>>>>><br>
>>>>> What build mode was the
clang binary built in? Optimized or
unoptimized?<br>
>>>> right, that is unoptimized
build with -ffunction-sections.<br>
>>>><br>
>>>>>> 2. DWARFLinker from
lld = linking time for clang 72 sec,
clang=760M.<br>
>>>>> It does seem a tad
strange that the clang binary would be smaller<br>
>>>>> non-split with DWARF
linking than it was split. Though I could
imagine<br>
>>>>> this might be possible in
an optimized build (wehre debug_ranges<br>
>>>>> become quite relatively
expensive in the .o file contribution with<br>
>>>>> Split DWARF)<br>
>>>>> Could you compare the
section sizes between these two clang
binaries, perhaps?<br>
>>>> .debug_ranges is three times
bigger and .debug_line is twice bigger.<br>
>>>><br>
>>>>>>>> Thus if they
would use this LLD feature in its current
state<br>
>>>>>>>> - they would
still receive benefits.<br>
>>>>>>>><br>
>>>>>>>> Speaking of
performance results - LLD is a multi-thread
linker;<br>
>>>>>>>> it handles
sections in parallel. DWARFLinker generates
DWARF using<br>
>>>>>>>> AsmPrinter
which is a stream - so it could make resulting
DWARF only<br>
>>>>>>>> continuously.
It is not surprising that the parallel
solution works faster.<br>
>>>>>>>> Making
DWARFLinker truly multi-threaded would
probably allow us<br>
>>>>>>>> to make
slowdown to be at 2x-4x range.<br>
>>>>>>> *nod* that's
still a really expensive link - but I
understand that's a<br>
>>>>>>> suitable tradeoff
for your users<br>
>>>>>>><br>
>>>>>> Btw, 2x or 7x is for
pure linking time. Overall compilation
slowdown<br>
>>>>>> is not so
significant. Building LLVM codebase has only
20% slowdown.<br>
>>>>> Understood - that's still
quite significant to most users, I'd imagine.<br>
>>>> I see.<br>
>>>><br>
>>>>>>>>>>
Anyway, I think the dsymutil approach is still
valuable, and it<br>
>>>>>>>>>> would
be useful to optimize it.<br>
>>>>>>>>>> Do
you think it would be useful to make
dsymutil/DWARFLinker truly multi-thread?<br>
>>>>>>>>>> (To
make dsymutil/DWARFLinker able to process each
object file in a separate thread)<br>
>>>>>>>>> Perhaps -
that I'd probably leave up to the folks who
are more<br>
>>>>>>>>> invested
in dsymutil (Adrian Prantl et al). Maybe one
day we'll get it<br>
>>>>>>>>>
integrated into llvm-dwp and then I'll be
interested in getting as<br>
>>>>>>>>> much
performance out of it as lld - so
multithreading and things would<br>
>>>>>>>>> be on the
books.<br>
>>>>>>>> I think
improving dsymutil is a valuable thing.<br>
>>>>>>>> Though there
are several directions which might be
considered<br>
>>>>>>>> to make it
more robust:<br>
>>>>>>>><br>
>>>>>>>> 1. support of
latest DWARF - DWARF5/DWARF64...<br>
>>>>>>> I expect/though
some of the Apple folks had already worked on
DWARF5 support?<br>
>>>>>>> DWARF64 - that's
been around for a while, and just hasn't been
needed<br>
>>>>>>> by LLVM users
thus far, it seems (until recently - where
some<br>
>>>>>>> developers have
started working on that)<br>
>>>>>> There already
implemented debug_names table, but
debug_rnglists,<br>
>>>>>> debug_loclists, type
units - are not implemented yet.<br>
>>>>> Superficially, type units
wouldn't be on the list of features (like<br>
>>>>> DWARF64 - it's optional)
I'd try to support in dsymutil - since their<br>
>>>>> size overhead is more
justified for a DWARF-agnostic linker that's<br>
>>>>> using comdat groups. With
a DWARF-aware linker I'd be specifically<br>
>>>>> hoping to avoid using
type units to help<br>
>>>>>> The thing which<br>
>>>>>> should probably be
changed is that dsymutil should not have its
version<br>
>>>>>> of code generating
DWARF tables. It should call already existed<br>
>>>>>> DWARF5/DWARF64
implementations. Then dsymutil would always<br>
>>>>>> use last DWARF
generators.<br>
>>>>> Possibly - I don't know
what the architectural tradeoffs for that look<br>
>>>>> like - I'd imagine
DWARFLinker has sufficiently different<br>
>>>>> needs/tradeoffs than
LLVM's DWARF generation code (rewriting
existing<br>
>>>>> DIEs compared to building
new ones from scratch, etc) that it might be<br>
>>>>> hard for them to share a
lot of their implementation.<br>
>>>> It is not easy, and would
require some additions, but it would benefit<br>
>>>> in that all format
implementation is in one place. Thus changing
that place<br>
>>>> would reflect in other
places. There are at least three
implementations for<br>
>>>> .debug_ranges, .debug_aranges
currently...<br>
>>>><br>
>>>><br>
>>>>>>>> 2. implement
multi-threaded execution.<br>
>>>>>>>> 3. support of
split DWARF.<br>
>>>>>>> Maybe, though I'm
still not sure it'd be the right tradeoff -<br>
>>>>>>> especially if it
involved having to wait to run the .dwo merger
(call<br>
>>>>>>> it DWARF-aware
dwp, or dsymutil with dwp support) until after
the<br>
>>>>>>> linker ran.<br>
>>>>>>><br>
>>>>>>>> 4. implement
dsymutil for non-darwin platform.<br>
>>>>>>> That's probably,
essentially (3), more-or-less. Split DWARF is<br>
>>>>>>> somewhat of a
formalization of Apple's/MachO DWARF
distribution model<br>
>>>>>>> (leave DWARF it
in files that aren't linked/use them from a
debugger,<br>
>>>>>>> but also be able
to merge them into some final file (dsym or
dwp) for<br>
>>>>>>> archival
purposes)<br>
>>>>>>><br>
>>>>>>>> All of this
is a massive piece of work.<br>
>>>>>>>> Our original
investment was to solve two problems:<br>
>>>>>>>><br>
>>>>>>>> 1. Overlapped
address ranges, which is currently close to
being solved. Thank you for helping with that!<br>
>>>>>>> Yeah, again,
sorry that's taken quite so long/somewhat
circuitous route.<br>
>>>>>>><br>
>>>>>>>> 2. Size of
debug info. That still becomes an issue, but
we are unsure whether we are ready to<br>
>>>>>>>> invest
in solving all the above 1-4 problems and how
much community interested in it.<br>
>>>>>>> Fair, for sure -
I don't think you'd need to sign up to solve
all of<br>
>>>>>>> them (don't think
they necessarily need solving). Potentially
moving<br>
>>>>>>> the logic out
into a separate tool as Fangrui's considering
- a<br>
>>>>>>> post-link DWARF
optimizer, rather than in-linker DWARF
optimization.<br>
>>>>>>><br>
>>>>>>> I really don't
want to give you the runaround like this - but
multiple<br>
>>>>>>> times slower
links is something that seems pretty
problematic for most<br>
>>>>>>> users, to the
point of weighing the maintainability of lld
against the<br>
>>>>>>> convenience of
having this functionality in-linker rather
than in a<br>
>>>>>>> post-link
optimizer.<br>
>>>>>>><br>
>>>>>>> (I know you've
spoken a bit before about your users needs -
but if<br>
>>>>>>> it's possible,
could you explain (again :/) why they have
such a<br>
>>>>>>> strong need for
smaller DWARF? While DWARF size is an ongoing
concern<br>
>>>>>>> for many users
(Google certainly - hence the invention of
Split DWARF,<br>
>>>>>>> use of type units
and compressed DWARF, etc) - usually it's in
rather<br>
>>>>>>> large programs,
but it sounds like you're dealing with
relatively<br>
>>>>>>> small ones
(otherwise the increase in link time, I'd
imagine, would be<br>
>>>>>>> prohibitive for
your users?)?<br>
>>>>>> We have many large
programs and keep Dayly/Nightly debug builds,<br>
>>>>>> which takes a lot of
disk space. Compilation time for these
programs is big.<br>
>>>>>> The scenario is
"compile once".(not
compile-debug-compile-debug).<br>
>>>>>> So we think that
solution(like dsymutil/DWARFLinker) would not
slowdown<br>
>>>>>> the compilation time
of overall build significantly(see above
numbers for<br>
>>>>>> llvm codebase) and
would allow us to reduce disk space required
to keep<br>
>>>>>> all of these builds.<br>
>>>>> Ah, OK - for archival
purposes. So the interactive developers
wouldn't<br>
>>>>> necessarily be using this
feature. Makes sense - similar to dsymutil<br>
>>>>> and dwp, mostly used for
archival purposes & you can debug straight<br>
>>>> >from .o/.dwos for
interactive/iterative development.<br>
>>>><br>
>>>>> In that case, it seems
more likely that a separate tool might
suffice.<br>
>>>> agreed: if to continue the
work on this then it makes sense to<br>
>>>> do it as separate tool. Make
it fast enough. And if there would be interest<br>
>>>> in it - then it would
probably be possible to return to idea calling
it from linker.<br>
>>>><br>
>>>>> Also, out of curiosity -
have you tried just compressing the output<br>
>>>>> (-gz (I think that does
the right thing for the linker level<br>
>>>>> compression too,
otherwise -Wl,-compress-debug-sections might
do it))<br>
>>>>> or are you already doing
that in addition?<br>
>>>> sure. we use
-Wl,-compress-debug-sections.<br>
>>>><br>
>>>> Thank you, Alexey.<br>
>>>><br>
>>>>>>> You mentioned
that the usability cost of<br>
>>>>>>> Split DWARF for
your users was too high (or high enough to
justify<br>
>>>>>>> this alternative
work of DWARF-aware linking)? That all seems a
bit<br>
>>>>>>> surprising to me
- though I understand the deployment issues of
Split<br>
>>>>>>> DWARF do present
some challenges to users in more heterogenous<br>
>>>>>>> environments than
Google's... still, I'd have thought there was
some<br>
>>>>>>> hope there)<br>
>>>>>> Our tools does not
support split dwarf yet. Though we plan to
implement it.<br>
>>>>>> When we would have
support of split dwarf then it would be<br>
>>>>>> convenient to have
easy way to share built debug binaries.
llvm-dwp is the<br>
>>>>>> answer to this.
DWARFLinker could probably be another answer.<br>
>>>>> Ah, fair enough - thanks
for the context!<br>
>>>>>>>>> One way
to do that would be to have a CU-local type
indirection table.<br>
>>>>>>>>> DIEs
reference local type numbers (like local
address/string numbers -<br>
>>>>>>>>>
addrx/strx/rnglistx) and that table contains
either sig8 (no linker<br>
>>>>>>>>> fixups
required) or the local type offsets you
describe - the linker<br>
>>>>>>>>> would
then only need to read this type number
indirection table and<br>
>>>>>>>>> rewrite
them to the final type numbers.<br>
>>>>>>>> Yes, that
could be additionally done if this process
would be time-consuming.<br>
>>>>>>>><br>
>>>>>>>> David, thank
you for all your comments and explanations.
They are extremely helpful.<br>
>>>>>>> Sure thing -
really appreciate your patience with all this
- it's... a<br>
>>>>>>> lot of moving
parts.<br>
>>>>>>> - Dave<br>
>>>>>>> Thank you,
Alexey.<br>
>>>>>>><br>
>>>>>>>> sig8 hash-id
would be used to compare types and to
deduplicate them.<br>
>>>>>>>> It would
speed up the current dsymutil context
analysis.<br>
>>>>>>>> Types having
the same hash-id could be deduplicated.<br>
>>>>>>>> This would
allow deduplicating a more number of types
than current dsymutil.<br>
>>>>>>>> Incomplete
type definitions having a similar set of
members are not deduplicated by dsymutil
currently.<br>
>>>>>>>> In this case
they would have the same hash-id.<br>
>>>>>>>><br>
>>>>>>>> This "type
table" would take less space than current
"type units" and current ODR solution.<br>
>>>>>>>><br>
>>>>>>>> Above is just
an idea on how to help DWARF-aware
linker(based on idea removing obsolete debug
info)<br>
>>>>>>>> to work
faster(if that is interesting).<br>
>>>>>>>><br>
>>>>>>>> Alexey.<br>
>>>>>>>><br>
>>>>>>>>> From:
llvm-dev <<a
href="mailto:llvm-dev-bounces@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev-bounces@lists.llvm.org</a>>
On Behalf Of James Henderson via llvm-dev<br>
>>>>>>>>> Sent:
Wednesday, June 3, 2020 3:48 AM<br>
>>>>>>>>> To: David
Blaikie <<a
href="mailto:dblaikie@gmail.com"
target="_blank" moz-do-not-send="true">dblaikie@gmail.com</a>><br>
>>>>>>>>> Cc: <a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
>>>>>>>>> Subject:
Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove
obsolete debug info in lld.<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> It makes
me sad that the linker (via a library or
otherwise) has to be "DWARF-aware" to be able
to effectively handle --gc-sections, COMDATs,
--icf etc for debug info, without leaving
large blocks of data kicking around.<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> The
patching to -1 (or equivalent) is probably a
good lightweight solution (though I'd love it
if it could be done based on section type in
the future rather than section name, but
that's probably outside the realm of DWARF),
as it requires only minimal understanding in
the linker, but anything beyond that seems to
be complicated logic that is mostly due to the
structure of DWARF. Patching to -1 does feel a
bit like a sticking plaster/band aid to patch
over the issue rather than properly solving it
too - there will still be debug data
(potentially significant amounts in
COMDAT-heavy objects) that the linker has to
write and the debugger has to somehow know how
to skip (even if it knows that -1 is
special-case due to the standard being
updated, it needs to get as far as the -1),
which is all wasted effort.<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> We've
already seen from Alexey's prototyping, and
from our own experiences with the Sony
proprietary linker (which tried to rewrite
.debug_line only) that deconstructing the
DWARF so that it can be more optimally
reassembled at link time is slow going, and
will probably inevitably be however much
effort is put into optimising it. For a start,
given the current standards, it's impossible
to know how to deconstruct it without having
to parse vast amounts of DWARF, which is
typically going to mean a lot more parsing
work than the linker would normally have to
deal with. Additionally, much of this parsing
work is wasted effort, since it seems unlikely
in many links that large amounts of the DWARF
will be redundant. Having an option to opt-in
doesn't help much there, since it just means
the logic exists without most people using it,
due to it not being good enough, or
potentially they don't even know it exists.<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> I don't
have particularly concrete suggestions as to
how to solve the structural problems with
DWARF at this point. The only thing that seems
obvious to me is a more "blessed" approach to
fragmentation of sections, similar to what I
tried with my prototype mentioned earlier in
the thread, although we'd need to figure out
the previously stated performance issues.
Other ideas might tie into this, like somehow
sharing the various table headers a bit like
CIEs in .eh_frame that could be merged by the
linker - each object could have separate table
header sections, which are referenced by the
individual .debug_* blocks, which in turn are
one per function/data piece and easily
discardable/merged by the linker.<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> Just some
thoughts.<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> James<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> On Tue, 2
Jun 2020 at 19:24, David Blaikie via llvm-dev
<<a href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:<br>
>>>>>>>>><br>
>>>>>>>>> On Tue,
May 19, 2020 at 7:17 AM Alexey Lapshin<br>
>>>>>>>>> <<a
href="mailto:alapshin@accesssoftek.com"
target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>
wrote:<br>
>>>>>>>>>> Hi
David, please find my comments inside:<br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>>>>>> Broad question: Do
you have any specific motivation/users/etc in
implementing this (if you can speak about it)?<br>
>>>>>>>>>>>>> - it might help
motivate the work, understand what tradeoffs
might be suitable for you/your users, etc.<br>
>>>>>>>>>>>> There are two general
requirements:<br>
>>>>>>>>>>>> 1) Remove (or clean)
invalid debug info.<br>
>>>>>>>>>>>
Perhaps a simpler direct solution for your
immediate needs might be a much narrower,<br>
>>>>>>>>>>>
and more efficient linker-DWARF-awareness
feature:<br>
>>>>>>>>>>><br>
>>>>>>>>>>>
With DWARFv5, rnglists present an opportunity
for a DWARF linker to rewrite the ranges<br>
>>>>>>>>>>>
without parsing the rest of the DWARF.
/technically/ this isn't guaranteed - rnglist
entries<br>
>>>>>>>>>>>
can be referenced either directly, or by
index. If all rnglists are referenced by
index, then<br>
>>>>>>>>>>> a
linker could parse only the debug_rnglists
section and rewrite ranges to remove any<br>
>>>>>>>>>>>
address ranges that refer to optimized-out
code.<br>
>>>>>>>>>>><br>
>>>>>>>>>>>
This would only be correct for rnglists that
had no direct references to them (that only
were<br>
>>>>>>>>>>>
referenced via the indexes) - but we could
either implement it with that assumption, or
could<br>
>>>>>>>>>>>
add an LLVM extension attribute on the CU that
would say "I promise I only referenced
rnglists<br>
>>>>>>>>>>>
via rnglistx forms/indexes). If this
DWARF-aware linking would have to read the CU
DIE (not<br>
>>>>>>>>>>>
all the other DIEs) it /could/ also then
rewrite high/low_pc if the CU wasn't using
ranges...<br>
>>>>>>>>>>>
but that wouldn't come up in the
function-removal case, because then you'd have
ranges anyway,<br>
>>>>>>>>>>>
so no need for that.<br>
>>>>>>>>>>><br>
>>>>>>>>>>>
Such a DWARF-aware rnglist linking could also
simplify rnglists, in cases where functions<br>
>>>>>>>>>>>
ended up being laid out next to each other,
the linker could coalesce their ranges
together.<br>
>>>>>>>>>>><br>
>>>>>>>>>>> I
imagine this could be implemented with very
little overhead to linking, especially
compared<br>
>>>>>>>>>>>
to the overhead of full DWARF-aware linking.<br>
>>>>>>>>>>><br>
>>>>>>>>>>>
Though none of this fixes Split DWARF, where
the linker doesn't get a chance to see the<br>
>>>>>>>>>>>
addresses being used - but if you only
want/need the CU-level ranges to be correct,
this<br>
>>>>>>>>>>>
might be a viable fix, and quite efficient.<br>
>>>>>>>>>> Yes,
we think about that alternative. This would
resolve our problem of invalid debug info<br>
>>>>>>>>>> and
would work much faster. Thus, if we would not
have good results for D74169 then we<br>
>>>>>>>>>> will
implement it. Do you think it could be useful
to have this solution in upstream?<br>
>>>>>>>>> A pure
rnglist rewriting - I think it'd be OK to have
in upstream -<br>
>>>>>>>>> again,
cost/benefit/etc would have to be weighed. I'm
not sure it<br>
>>>>>>>>> would
save enough space to be particularly valuable
beyond the<br>
>>>>>>>>>
correctness issue - and it doesn't completely
solve the correctness<br>
>>>>>>>>> issue for
zero-address usage or low-address usage
(because you could<br>
>>>>>>>>> still
have overlapping subprograms inside a CU - so
if you were<br>
>>>>>>>>>
symbolizing you could use the correct rnglist
to filter, but then go<br>
>>>>>>>>> look
inside the CU only to find two subprograms
that had that address<br>
>>>>>>>>> & not
know which one was the correct one an which
one was the<br>
>>>>>>>>> discarded
one).<br>
>>>>>>>>><br>
>>>>>>>>> rnglist
rewriting might be easy enough to prototype -
but depends what<br>
>>>>>>>>> you want
to spend your time on, I know this whole issue
has been a<br>
>>>>>>>>> huge
investment of your time already - but maybe
this recent<br>
>>>>>>>>>
revitalization of the conversation around
having an explicit value in<br>
>>>>>>>>> the
linker might be sufficient to address
everyone's needs... *fingers<br>
>>>>>>>>> crossed*)<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>>>>> 2) Optimize the DWARF
size.<br>
>>>>>>>>>>>
Do your users care much about this? I imagine
if they had significant DWARF size issues,<br>
>>>>>>>>>>>
they'd have significant link time issues and
the kind of cost to link time this feature has
would<br>
>>>>>>>>>>>
be prohibitive - but perhaps they're sharing
linked binaries much more often than they're<br>
>>>>>>>>>>>
actually performing linking.<br>
>>>>>>>>>> Yes,
they do. They also have significant link-time
issues.<br>
>>>>>>>>>> So
current performance results of D74169 are not
very acceptable.<br>
>>>>>>>>>> We
hope to improve it.<br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>>>>> The specifics which our
users have:<br>
>>>>>>>>>>>> - embedded platform
which uses 0 as start of .text section.<br>
>>>>>>>>>>>> - custom toolset
which does not support all features yet(f.e.
split dwarf).<br>
>>>>>>>>>>>> - tolerant of the
link-time increase.<br>
>>>>>>>>>>>> - need a useful way
to share debug builds.<br>
>>>>>>>>>>>
Sharing two files (executable and dwp) is
significantly less useful than sharing one
file?<br>
>>>>>>>>>>
Probably not significantly, but yes, it looks
less useful comparing to D74169.<br>
>>>>>>>>>>
Having only two files (executable and .dwp)
looks significantly better than having
executable and multiple .dwo files.<br>
>>>>>>>>>>
Having only one file(executable) with minimal
size looks better than the two files with a
bigger size.<br>
>>>>>>>>>><br>
>>>>>>>>>> clang
compiled with -gsplitdwarf takes 0.9G for
executable and 0.9G for .dwp.<br>
>>>>>>>>>> clang
compiled with -gc-debuginfo takes only 0.76G
for single executable.<br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>>>>> For the first point: we
have a problem "Overlapping address ranges
starting from 0"(D59553).<br>
>>>>>>>>>>>> We use custom solution,
but the general solution like D74169 would be
better here.<br>
>>>>>>>>>>>
If CU ranges are the only ones that need
fixing, then I think the above solution might
be as<br>
>>>>>>>>>>>
good/better - if more than CU ranges need
fixing, then I think we might want to start
talking about<br>
>>>>>>>>>>>
how to fix DWARF itself (split and non-split)
to signal certain addresses point to dead code
with a<br>
>>>>>>>>>>>
specific blessed value that linkers would need
to implement - because with Split DWARF
there's<br>
>>>>>>>>>>>
no way to solve the non-CU addresses at the
linker.<br>
>>>>>>>>>> I
think the worthful solution for that signal
value would be LowPC > HighPC.<br>
>>>>>>>>>> That
does not require additional bits in DWARF.<br>
>>>>>>>>>> It
would be natural to skip such address ranges
since they explicitly marked as invalid.<br>
>>>>>>>>>> It
could be implemented in a linker very easily.
Probably, it would make sense to describe that<br>
>>>>>>>>>> usage
in DWARF standard.<br>
>>>>>>>>>><br>
>>>>>>>>>> As to
the addresses which are not seen by the
linker(since they are in .dwo files) - yes,<br>
>>>>>>>>>> they
need to have another solution. Could you show
an example of such a case, please?<br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>>>>>> 2. Support of type
units.<br>
>>>>>>>>>>>>>> That could
be implemented further.<br>
>>>>>>>>>>>>> Enabling type units
increases object size to make it easier to
deduplicate at link time by a DWARF-unaware<br>
>>>>>>>>>>>>> linker. With a
DWARF aware linker it'd be generally desirable
not to have to add that object size overhead
to<br>
>>>>>>>>>>>>> get the linking
improvements.<br>
>>>>>>>>>>>> But, DWARFLinker should
adequately work with type units since they are
already implemented.<br>
>>>>>>>>>>>
Maybe - it'd be nice & all, but I don't
think it's an outright necessity - if someone
knows they're using<br>
>>>>>>>>>>> a
DWARF-aware linker, they'd probably not use
type units in their object files. It's
possible someone<br>
>>>>>>>>>>>
doesn't know for sure & maybe they have
pre-canned debug object files from someone
else, etc.<br>
>>>>>>>>>> I
see.<br>
>>>>>>>>>><br>
>>>>>>>>>>>> Another thing is that
the idea behind type units has the potential
to help Dwarf-aware linker to work faster.<br>
>>>>>>>>>>>> Currently, DWARFLinker
analyzes context to understand whether types
are the same or not.<br>
>>>>>>>>>>>
When you say "analyzes context" what do you
mean? Usually I'd take that to mean<br>
>>>>>>>>>>>
"looks at things outside the type itself -
like what namespace it's in, etc" - which,
yes,<br>
>>>>>>>>>>>
it should do that, but it doesn't seem very
expensive to do. But I guess you actually<br>
>>>>>>>>>>>
mean something about doing structural
equivalence in some way, looking at things
inside the type?<br>
>>>>>>>>>> I
think it could be useful for both cases.
Currently, dsymutil does only first thing<br>
>>>>>>>>>> (look
at type name, namespace name, etc..) and does
not do the second thing<br>
>>>>>>>>>>
(doing structural equivalence). Analyzing type
names is currently quite expensive<br>
>>>>>>>>>> (the
only search in string pool takes ~10 sec from
70 sec of overall time).<br>
>>>>>>>>>> That
is expensive because of many things should be
done to work with strings:<br>
>>>>>>>>>> parse
DWARF, search and resolve relocations, compute
a hash for strings,<br>
>>>>>>>>>> put
data into a string pool, create a fully
qualified name(like
namespace::function::name).<br>
>>>>>>>>>> It
looks like it could be optimized and finally
require less time, but it still would be a
noticeable<br>
>>>>>>>>>> part
of the overall time.<br>
>>>>>>>>>><br>
>>>>>>>>>> If
dsymutil starts to check for the structural
equivalence, then the process would be even
more slowly.<br>
>>>>>>>>>> So,
If instead of comparing types structure, there
would be checked single hash-id - then this
process<br>
>>>>>>>>>> would
also be faster.<br>
>>>>>>>>>><br>
>>>>>>>>>> Thus
I think using hash-id to compare types would
allow to make current implementation faster
and would<br>
>>>>>>>>>> allow
handling incomplete types by DWARFLinker
without massive performance degradation also.<br>
>>>>>>>>>><br>
>>>>>>>>>>>> But the context is
known when types are generated. So, no need to
spent the time analyzing it.<br>
>>>>>>>>>>>> If types could be
compared without analyzing context, then
Dwarf-aware linker would work faster.<br>
>>>>>>>>>>>> That is just an
idea(not for immediate implementation): If
types would be stored in some "type table"<br>
>>>>>>>>>>>> (instead of COMDAT
section group) and could be accessed through
hash-id(like type units<br>
>>>>>>>>>>>> - then it would be the
solution requiring fewer bits to store but
allowing to compare types<br>
>>>>>>>>>>>> by hash-id(not
analysing context).<br>
>>>>>>>>>>>> In this case, size
increasing would be small. And processing time
could be done faster.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> this is just an idea
and could be discussed separately from the
problem of integrating of D74169.<br>
>>>>>>>>>>>>>> 6. -flto=thin<br>
>>>>>>>>>>>>>> That
problem was described in this review <a
href="https://reviews.llvm.org/D54747#1503720"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://reviews.llvm.org/D54747#1503720</a>.
It also exists in<br>
>>>>>>>>>>>>>> current
DWARFLinker/dsymutil implementation. I think
that problem should be discussed more: it
could<br>
>>>>>>>>>>>>>> probably be
fixed by avoiding generation of such
incomplete declaration during thinlto,<br>
>>>>>>>>>>>>>> That would be
costly to produce extra/redundant debug info
in ThinLTO - actually ThinLTO could be doing<br>
>>>>>>>>>>>>>> more to reduce
that redundancy early on (actually removing
definitions from some llvm Modules if the type<br>
>>>>>>>>>>>>>> definition is
known to exist in another Module, etc)<br>
>>>>>>>>>>>>> I don't know if
it's a problem since that patch was reverted.<br>
>>>>>>>>>>>> Yes. That patch was
reverted, but this patch(D74169) has the same
problem.<br>
>>>>>>>>>>>> if D74169 would be
applied and --gc-debuginfo used then structure
type<br>
>>>>>>>>>>>> definition would be
removed.<br>
>>>>>>>>>>>> DWARFLinker could
handle that case - "removing definitions from
some llvm Modules if the type<br>
>>>>>>>>>>>> definition is known to
exist in another Module".<br>
>>>>>>>>>>>> i.e. DWARFLinker could
replace the declaration with the definition.<br>
>>>>>>>>>>>> But that problem could
be more easily resolved when debug info is
generated(probably without<br>
>>>>>>>>>>>> significant increase of
debug info size):<br>
>>>>>>>>>>>> Here we have:<br>
>>>>>>>>>>>>
DW_TAG_compile_unit(0x0000000b) - compile unit
containing concrete instance for function "f".<br>
>>>>>>>>>>>>
DW_TAG_compile_unit(0x00000073) - compile unit
containing abstract instance root for function
"f".<br>
>>>>>>>>>>>>
DW_TAG_compile_unit(0x000000c1) - compile unit
containing function "f" definition.<br>
>>>>>>>>>>>> Code for function "f"
was deleted. gc-debuginfo deletes compile unit
DW_TAG_compile_unit(0x000000c1)<br>
>>>>>>>>>>>> containing "f"
definition (since there is no corresponding
code). But it has structure "Foo" definition<br>
>>>>>>>>>>>>
DW_TAG_structure_type(0x0000011e) referenced
from DW_TAG_compile_unit(0x00000073)<br>
>>>>>>>>>>>> by declaration
DW_TAG_structure_type(0x000000ae). That
declaration is exactly the case when
definition<br>
>>>>>>>>>>>> was removed by thinlto
and replaced with declaration.<br>
>>>>>>>>>>>> Would it cost too much
if type definition would not be replaced with
declaration for "abstract instance root"?<br>
>>>>>>>>>>>> The number of concrete
instances is bigger than number of abstract
instance roots.<br>
>>>>>>>>>>>> Probably, it would not
be too costly to leave definition in abstract
instance root?<br>
>>>>>>>>>><br>
>>>>>>>>>>>> Alternatively, Would it
cost too much if type definition would not be
replaced with declaration when<br>
>>>>>>>>>>>> declaration references
type from not used function? (lto could
understand that concrete function is not
used).<br>
>>>>>>>>>>> I
don't follow this example - could you provide
a small concrete test case I could reproduce?<br>
>>>>>>>>>> I
would provide a test case if necessary. But it
looks like this issue is finally clear, and
you already commented on that.<br>
>>>>>>>>>><br>
>>>>>>>>>>>
Oh, I guess this is happening perhaps because
ThinLTO can't know for sure that a standalone<br>
>>>>>>>>>>>
definition of 'f' won't be needed - so it
produces one in case one of the inlining
opportunities<br>
>>>>>>>>>>>
doesn't end up inlining. Then it turns out all
calls got inlined, so the external definition
wasn't needed.<br>
>>>>>>>>>>>
Oh, you're suggesting that these 3 CUs got
emitted into one object file during LTO, but
that DWARFLinker<br>
>>>>>>>>>>>
drops a CU without any code in it - even
though... So far as I know, in LTO, LLVM
directly references<br>
>>>>>>>>>>>
types across units if the CUs are all emitted
in the same object file. (and if they weren't
in the same<br>
>>>>>>>>>>>
object file - then the abstract_origin
couldn't be pointing cross-CU).<br>
>>>>>>>>>>> I
guess some basic things to say:<br>
>>>>>>>>>>>
With ThinLTO, the concrete/standalone function
definition is emitted in case some call sites
don't end up<br>
>>>>>>>>>>>
being inlined. So we know it'll be emitted
(but might not be needed by the actual linker)<br>
>>>>>>>>>>>
ANy number of inline calls might exist - but
we shouldn't put the type information into
those, because<br>
>>>>>>>>>>>
they aren't guaranteed to emit it (if the
inline function gets optimized away, there
would be nothing to<br>
>>>>>>>>>>>
enforce the type being emitted) - and even if
we forced the type information to be emitted
into one<br>
>>>>>>>>>>>
object file that has an inline copy of the
function - there's no guarantee that object
file will get linked in either.<br>
>>>>>>>>>>>
So, no, I don't think there's much we can do
to keep the size of object files down, while
guaranteeing<br>
>>>>>>>>>>>
the type information will be emitted with the
usual linker semantics.<br>
>>>>>>>>>> Then
dsymutil/DWARFLinker could be changed to
handle that(though it would probably be not
very efficient).<br>
>>>>>>>>>> If
thinlto would understand that function is not
used finally(and then must not contain
referenced type definition),<br>
>>>>>>>>>> then
this situation could be handled more
effectively.<br>
>>>>>>>>>><br>
>>>>>>>>>> Thank
you, Alexey.<br>
>>>>>>>>>><br>
>>>>>>>>>>>><br>
>>>>>>>>>>>><br>
>>>>>>>>>>>>
_______________________________________________<br>
>>>>>>>>>>>> LLVM Developers mailing
list<br>
>>>>>>>>>>>> <a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
>>>>>>>>>>>> <a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
>>>>>>>>>
_______________________________________________<br>
>>>>>>>>> LLVM
Developers mailing list<br>
>>>>>>>>> <a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
>>>>>>>>> <a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
>>>
_______________________________________________<br>
>>> LLVM Developers mailing list<br>
>>> <a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
>>> <a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
<a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>