<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 28.10.2020 20:38, David Blaikie
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020 at 6:01
AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com"
moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 28.10.2020 01:49, David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Without loading all CU
into the memory it would
require two passes
solution. First to analyze
<br>
which part of DWARF
relates to live code and
then second pass to
generate the result. <br>
</p>
</div>
</blockquote>
<div>Not sure it'd require any
more second pass than a
"fixup" pass, which it sounds
like you're saying it already
has? <br>
</div>
</div>
</div>
</blockquote>
<p>It looks like it would need an
additional pass to process inter-CU
references(existed in incoming file)
if we do not want to load all CUs
into memory.<br>
</p>
</div>
</blockquote>
<div>Usually inter-CU references aren't
used, except in LTO - and in LTO all the
DWARF deduplication and function
discarding is already done by the IR
linker anyway. (ThinLTO is a bit
different, but really we'd be better off
teaching it the extra tricks anyway
(some can't be fixed in ThinLTO - like
emitting a "Home" definition of an
inline function, only to find out other
ThinLTO backend/shards managed to
optimize away all uses of the
function... so some cleanup may be
useful there)). It might be possible to
do a more dynamic/rolling cache - keep
only the CUs with unresolved cross-CU
references alive and only keep them
alive until their cross-CU references
are found/marked alive. This should make
things no worse than the traditional
dsymutil case - since cross-CU
references are only effective/generally
used within a single object file (it's
possible to create relocations for them
into other files - but I know LLVM
doesn't currently do this and I don't
think GCC does it) with multiple CUs
anyway - so at most you'd keep all the
CUs from a single original input file
alive together.<br>
</div>
</div>
</div>
</blockquote>
But, since it is a DWARF documented case the
tool should be ready for such case(when inter-CU
<br>
references are heavily used).</div>
</blockquote>
<div><br>
Sure - but by implementing a CU liveness window
like that (keeping CUs live only so long as they
need to be rather than an all-or-nothing approach)
only especially quirky inputs would hit the worst
case while the more normal inputs could perform
better.<br>
</div>
</div>
</div>
</blockquote>
<p>It is not clear what should be put in such CU liveness
window. If CU100 references CU1 - how could we know that
we need to put CU1 into CU liveness window before we
processed CU100?<br>
</p>
</div>
</blockquote>
<div>Fair point, not just forward references to worry about
but backward references too. I wonder how much savings there
is in the liveness analysis compared to "keep one copy of
everything, no matter whether it's live or not", then it can
be a pure forward progress situation. (with the quirk that
you might emit a declaration for an entity once, then a
definition for it later - alternatively if a declaration is
seen it could be skipped under the assumption that a
definition will follow (& use a forward ref fixup) - and
if none is found, splat some stub declarations into a
trailing CU at the end) <br>
</div>
</div>
</div>
</blockquote>
That should probably be measured, but I think we would loss most of
size reduction<br>
(since we would start keep unreferenced data which is currently
removed). <br>
Which would lead to slowdown performance and bigger disk space
usage.<br>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> Moreover, llvm-dwarfutil would be the tool
producing <br>
exactly such situation. The resulting
file(produced by llvm-dwarfutil) would contain a
lot of <br>
inter-CU references. Probably, there is no
practical reasons to apply llvm-dwarfutil to the
same <br>
file twice but it would be a good test for the
tool.<br>
</div>
</blockquote>
<div><br>
It'd be a good stress test, but not necessarily
something that would need to perform the best
because it wouldn't be a common use case.<br>
</div>
</div>
</div>
</blockquote>
<p>I agree that we should not slow down the DWARFLinker in
common cases only because we need to support the worst
cases.<br>
But we also need to implement a solution which works in
some acceptable manner for the worst case. </p>
</div>
</blockquote>
<div>I think that depends on "acceptable" - correct, yes.
Practical to run in reasonable time/memory? Not necessarily,
in my opinion. <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>The current solution - loading everything in memory -
makes it hard to use in a non-dsymutil
scenario(llvm-dwarfutil).<br>
</p>
</div>
</blockquote>
<div>I agree it's worth exploring the non-dsymutil scenario,
as you are - I'm just saying we don't necessarily need to
support high usability (fast/low memory usage/etc)
llvm-dwarfutil on an already dwarfutil'd binary (but as
you've pointed out, the "window" is unknowable because of
backward references, so this whole subthread is perhaps
irrelevant).<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>There could be several things which could be used to
decide whether we need to go on a light or heavy path:<br>
<br>
1. If the input contains only a single CU we do not need
to unload it from memory. Thus - we would not need to do
an extra DWARF loading pass.<br>
2. If abbreviations from the whole input file do not
contain inter-CU references then while doing liveness
analysis, we do not need to wait until other CUs are
processed.<br>
</p>
</div>
</blockquote>
<div>(2) Yeah, that /may/ be a good idea, cheap to test, etc.
Though I'd still wonder if a more general implementation
strategy could be found that would make it easier to get a
sliding scale of efficiency depending on how much inter-CU
references where were, not a "if there are none it's good,
if there are any it's bad or otherwise very different to
implement". <br>
</div>
</div>
</div>
</blockquote>
At the current point, I do not see how that could be done. <br>
One possibility is preliminary mark CU by IsReferenced flag.<br>
Then we could delay cloning for such CU(either by putting into<br>
CU liveness window/either by unloading). <br>
Not referenced CU could be cloned immediately. Such a solution would
be <br>
more scalable and work well in cases when only a few inter-CU
references<br>
exist. Though it requires changes in DWARF format.<br>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
Then that scheme would be used for worst cases:<br>
<br>
1. for (CU : CU1...CU100) {<br>
load CU.<br>
analyse CU.<br>
unload CU.<br>
} <br>
2. for (CU : CU1...CU100) {<br>
load CU.<br>
clone CU.<br>
unload CU.<br>
} <br>
3. fixup forward references.<br>
<br>
and that scheme for light cases:<br>
<br>
1. for (CU : CU1...CU100) {<br>
load CU.<br>
analyse CU.<br>
clone CU.<br>
unload CU.<br>
}<br>
2. fixup forward references.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>Generally, I think we should not assume that
inter-CU references would be used in a limited
way.<br>
<br>
Anyway, if this scheme: <br>
<br>
1. analyse all CUs.<br>
2. clone all CUs.<br>
<p>would work slow then we would need to
continue with one-pass solution and not
support complex closely coupled inputs.<br>
</p>
</div>
</blockquote>
<div><br>
</div>
<div>yeah, certainly seeing the data/experiments
will be interesting, if you end up implementing
some different strategies, etc.<br>
<br>
I guess one possibility for parallel generation
could be something more like Microsoft's approach
with a central debug info server that compilers
communicate with - not that exact model, I mean,
but if you've got parallel threads generating
reduced DWARF into separate object files - they
could communicate with a single thread responsible
for type emission - the type emitter would be
given types from the separate threads and compute
their size, queue them up to be streamed out to
the type CU (& keep the source CU alive until
that work was done) - such a central type emitter
could quickly determine the size of the type to be
emitted and compute future type offsets (eg: if 5
types were in the queue, it could've figured out
the offset of those types already) to answer type
offset queries quickly and unblock the parallel
threads to continue emitting their CUs containing
type references.<br>
</div>
</div>
</div>
</blockquote>
<p>yes. Thank you. Would think about it.<br>
</p>
<p>Alexey.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
- Dave </div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>