<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 2, 2020 at 2:26 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 02.11.2020 04:11, David Blaikie
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">I think if we're in the realm of DWARF extensions a
whole bunch of other considerations come into it (& indeed,
your suggested proposal may be a good one - but I think it's a
very wide problem space once we're considering DWARF
extensions). Mostly I was making arguments/suggestions/thoughts
on the basis of being compatible with all existing DWARF
producers.</div>
</blockquote>
<p>the described scenario does not assume DWARF extensions. global
type table is not new DWARF construction. This is an artificial CU
keeping all types. That solution would be compatible with existing
DWARF consumers/produces.<br></p></div></blockquote><div>Sorry, guess I'm not following. Maybe this conversation's getting a bit too abstract/theoretical/forward looking for me right now - no worries. Happy to chat more about it, but might be easier to focus on the immediate steps forward for now & tackle this when it's the thing you're planning to work on? (if I'm understanding correctly that this isn't a direction you're thinking to try right now)<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
</p>
<p><br>
</p>
<blockquote type="cite"><br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, Nov 1, 2020 at 2:05 PM
Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 28.10.2020 20:38, David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020
at 6:01 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 28.10.2020 01:49, David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue,
Oct 27, 2020 at 12:34 PM Alexey Lapshin
<<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 27.10.2020 20:32, David Blaikie
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Tue, Oct 27, 2020 at 1:23 AM
Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 26.10.2020 22:38,
David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Sun, Oct 25, 2020 at
9:31 AM Alexey Lapshin
<<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 23.10.2020
19:43, David
Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>Ah, yeah
- that seems
like a missed
opportunity -
duplicating
the whole type
DIE. LTO does
this by making
monolithic
types -
merging all
the members
from different
definitions of
the same type
into one, but
that's maybe
too expensive
for dsymutil
(might still
be interesting
to know how
much more
expensive,
etc). But I
think the
other way to
go would be to
produce a
declaration of
the type, with
the relevant
members - and
let the DWARF
consumer
identify this
declaration as
matching up
with the
earlier
definition.
That's the
sort of DWARF
you get from
the non-MachO
default
-fno-standalone-debug
anyway, so
it's already
pretty well
tested/supported
(support in
lldb's a bit
younger/more
work-in-progress,
admittedly). I
wonder how
much dsym size
there is that
could be
reduced by
such an
implementation.</div>
</div>
</div>
</blockquote>
<p>I see. Yes,
that could be
done and I
think it would
result in
noticeable
size
reduction(I do
not know exact
numbers at the
moment).</p>
<p>I work on
multi-thread
DWARFLinker
now and it`s
first version
will do
exactly the
same type
processing
like current
dsymutil.</p>
</blockquote>
<div>Yeah,
best to keep
the behavior
the same
through that</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Above
scheme could
be implemented
as a next step
and it would
result in
better size
reduction(better
than current
state).</p>
<p>But I think
the better
scheme could
be done also
and it would
result in even
bigger size
reduction and
in faster
execution.
This scheme is
something
similar to
what you`ve
described
above: "LTO
does - making
monolithic
types -
merging all
the members
from different
definitions of
the same type
into one".</p>
</div>
</blockquote>
<div>I believe
the reason
that's
probably not
been done is
that it can't
be streamed -
it'd lead to
buffering more
of the output
</div>
</div>
</div>
</blockquote>
<p>yes. The fact
that DWARF should
be streamed into
AsmPrinter
complicates
parallel dwarf
generation. In my
prototype, I
generate <br>
several resulting
files(each for one
source compilation
unit) and then
sequentially glue
them into the
final resulting
file.<br>
</p>
</div>
</blockquote>
<div>How does that help?
Do you use relocations
in those intermediate
object files so the
DWARF in them can
refer across files? <br>
</div>
</div>
</div>
</blockquote>
<p>It does not help with
referring across the file.
It helps to parallel the
generation of CU bodies. <br>
It is not possible to write
two CUs in parallel into
AsmPrinter. To make possible
parallel generation I stream
them into different
AsmPrinters(this comment is
for "I believe the reason
that's probably not been
done is that it can't be
streamed". which initially
was about referring across
the file, but it seems I
added another direction).<br>
</p>
</div>
</blockquote>
<div>Oh, I see - thanks for
explaining, essentially
buffering on-disk. <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>(if two
of these
expandable
types were in
one CU - the
start of the
second type
couldn't be
known until
the end
because it
might keep
getting pushed
later due to
expansion of
the first
type) and/or
having to
revisit all
the type
references
(the offset to
the second
type wouldn't
be known until
the end - so
writing the
offsets to
refer to the
type would
have to be
deferred until
then).<br>
</div>
</div>
</div>
</blockquote>
<p>That is the
second problem:
offsets are not
known until the
end of file.<br>
dsymutil already
has that situation
for inter-CU
references, so it
has extra pass to<br>
fixup offsets. </p>
</div>
</blockquote>
<div>Oh, it does? I
figured it was
one-pass, and that it
only ever refers back
to types in previous
CUs? So it doesn't
have to go back and do
a second pass. But I
guess if sees a
declaration of T1 in
CU1, then later on
sees a definition of
T1 in CU2, does it
somehow go back to CU1
and remove the
declaration/make
references refer to
the definition in CU2?
I figured it'd just
leave the declaration
and references to it
as-is, then add the
definition and use
that from CU2 onwards?
<br>
</div>
</div>
</div>
</blockquote>
<p>For the processing of the
types, it do not go back. <br>
This "I figured it was
one-pass, and that it only
ever refers back to types in
previous CUs" <br>
and this "I figured it'd
just leave the declaration
and references to it as-is,
then add the definition and
use that from CU2 onwards"
are correct. <br>
</p>
</div>
</blockquote>
<div>Great - thanks for
explaining/confirming! </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>With multi-thread
implementation
such situation
would arise more
often <br>
for type
references and so
more offsets
should be fixed
during additional
pass.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>DWARFLinker
could create
additional
artificial
compile unit
and put all
merged types
there. Later
patch all type
references to
point into
this
additional
compilation
unit. No any
bits would be
duplicated in
that case. The
performance
improvement
could be
achieved due
to less amount
of the copied
DWARF and due
to the fact
that type
references
could be
updated when
DWARF is
cloned(no need
in additional
pass for
that).<br>
</p>
</div>
</blockquote>
<div>"later
patch all type
references to
point into
this
additional
compilation
unit" - that's
the additional
pass that
people are
probably
talking/concerned
about.
Rewalking all
the DWARF. The
current
dsymutil
approach, as
far as I know,
is single pass
- it knows the
final,
absolute
offset to the
type from the
moment it
emits that
type/needs to
refer to it. <br>
</div>
</div>
</div>
</blockquote>
<p>Right. Current
dsymutil approach
is single pass.
And from that
point of view,
solution <br>
which you`ve
described(to
produce a
declaration of the
type, with the
relevant members)
<br>
allows to keep
that single pass
implementation.<br>
<br>
But there is a
restriction for
current dsymutil
approach: To
process inter-CU
references <br>
it needs to load
all DWARF into the
memory(While it
analyzes which
part of DWARF is
live, <br>
it needs to have
all CUs loaded
into the memory).</p>
</div>
</blockquote>
<div>All DWARF for a
single file (which for
dsymutil is mostly a
single CU, except with
LTO I guess?), not all
DWARF for all inputs
in memory at once,
yeah? <br>
</div>
</div>
</div>
</blockquote>
<p>right. In dsymutil case -
all DWARF for a single
file(not all DWARF for all
inputs in memory at once).<br>
But in llvm-dwarfutil case
single file contains DWARF
for all original input
object files and it all
becomes<br>
loaded into memory.<br>
</p>
</div>
</blockquote>
<div>Yeha, would be great to try
to go CU-by-CU. </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>That leads to
huge memory usage.
<br>
It is less
important when
source is a set of
object files(like
in dsymutil case)
and this <br>
become a real
problem for
llvm-dwarfutil
utility when
source is a single
file(With current
<br>
implementation it
needs 30G of
memory for
compiling clang
binary).<br>
</p>
</div>
</blockquote>
<div>Yeah, that's where
I think you'd need a
fixup pass one way or
another - because
cross-CU references
can mean that when you
figure out a new
layout for CU5
(because it has a
duplicate type
definition of
something in CU1) then
you might have to
touch CU4 that had an
absolute/cross-CU
forward reference to
CU5. Once you've got
such a fixup pass (if
dsymutil already has
one? Which, like I
said, I'm confused why
it would have one/that
doesn't match my very
vague understanding)
then I think you could
make dsymutil work on
a per-CU basis
streaming things out,
then fixing up a few
offsets.<br>
</div>
</div>
</div>
</blockquote>
<p>When dsymutil deduplicates
types it changes local CU
reference into inter-CU
reference(so that CU2(next)
could reference type
definition from CU1(prev)).
To do this change it does
not need to do any fixups
currently.<br>
<br>
When dsymutil meets already
existed(located in the input
object file) inter-CU
reference pointing into the
CU which has not been
processed yet(and then its
offset is unknown) it marks
it as "forward reference"
and patches later during
additional pass "fixup
forward references" at a
time when offsets are known.
<br>
</p>
</div>
</blockquote>
<div>OK, so limited 2 pass system.
(does it do that second pass
once at the end of the whole
dsymutil run, or at the end of
each input file? (so if an input
file has two CUs and the first
CU references a type in the
second CU - it could write the
first CU with a "forward
reference", then write the
second CU, then fixup the
forward reference - and then go
on to the next file and its CUs
- this could improve performance
by touching recently used
memory/disk pages only, rather
than going all the way back to
the start later on when those
pages have become cold)</div>
</div>
</div>
</blockquote>
<p>yes, It does it in the end of each
input file.</p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
If CUs would be processed in
parallel their offsets would
not be known at the moment
when local type reference
would be changed into
inter-CU reference. So we
would need to do the same
fix-up processing for all
references to the types like
we already do for other
inter-CU references.<br>
</p>
</div>
</blockquote>
<div>Yeah - though the existence
of this second "fixup forward
references" system - yeah, could
just use it much more generally
as you say. Not an extra pass,
just the existing second pass
but having way more fixups to
fixup in that pass.</div>
</div>
</div>
</blockquote>
If we would be able to change the
algorithm in such way : <br>
<br>
1. analyse all CUs.<br>
2. clone all CUs.<br>
<br>
Then we could create a merged type
table(artificial CU containing types)
during step1. <br>
If that type table would be written
first, then all following CUs could use
known offsets <br>
to the types and we would not need
additional fix-up processing for type
references. <br>
It would still be necessary to fix-up
other inter-CU references. But it would
not be necessary <br>
to fix-up type references (which
constitute the vast majority).<br>
</div>
</blockquote>
<div><br>
</div>
<div>To me, that sounds more expensive than
the fixup forward references pass.</div>
</div>
</div>
</blockquote>
<p>If we would speak about direct comparison then
yes loading DWARF one more time looks more
expensive than fixup forward references pass.
But if we would speak about the general picture
then it could probably be beneficial:<br>
<br>
1. merging types would lead to a smaller size of
resulting DWARF. This would speed up the
process.<br>
f.e. If we would switch "odr types
deduplication" off in current implementation
then it would increase execution time two times.
That is because more DWARF should be cloned and
written in the result. Implementation of
"merging types" would probably have a similar
effect <br>
- It would speed-up the overall process. So
from one side additional step for loading DWARF
would <br>
decrease performance but a smaller amount of
resulting data would increase performance.<br>
<br>
2. When types would be put in the first CU then
we would have a simple strategy for our liveness
analysis algorithm: just always keep the first
CU in memory. This allows us to speed up our
liveness analysis step.<br>
<br>
Anyway, all the above is just an idea for future
work. Currently, I am going to implement
multithread processing for CUs loaded into
memory and having the same type of processing as
it currently is(Which assumes that "fixup
forward references pass" started to do more work
by fixing types references).<br>
</p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Without loading
all CU into the
memory it would
require two passes
solution. First to
analyze <br>
which part of
DWARF relates to
live code and then
second pass to
generate the
result. <br>
</p>
</div>
</blockquote>
<div>Not sure it'd
require any more
second pass than a
"fixup" pass, which it
sounds like you're
saying it already has?
<br>
</div>
</div>
</div>
</blockquote>
<p>It looks like it would need
an additional pass to
process inter-CU
references(existed in
incoming file) if we do not
want to load all CUs into
memory.<br>
</p>
</div>
</blockquote>
<div>Usually inter-CU references
aren't used, except in LTO - and
in LTO all the DWARF
deduplication and function
discarding is already done by
the IR linker anyway. (ThinLTO
is a bit different, but really
we'd be better off teaching it
the extra tricks anyway (some
can't be fixed in ThinLTO - like
emitting a "Home" definition of
an inline function, only to find
out other ThinLTO backend/shards
managed to optimize away all
uses of the function... so some
cleanup may be useful there)).
It might be possible to do a
more dynamic/rolling cache -
keep only the CUs with
unresolved cross-CU references
alive and only keep them alive
until their cross-CU references
are found/marked alive. This
should make things no worse than
the traditional dsymutil case -
since cross-CU references are
only effective/generally used
within a single object file
(it's possible to create
relocations for them into other
files - but I know LLVM doesn't
currently do this and I don't
think GCC does it) with multiple
CUs anyway - so at most you'd
keep all the CUs from a single
original input file alive
together.<br>
</div>
</div>
</div>
</blockquote>
But, since it is a DWARF documented case
the tool should be ready for such
case(when inter-CU <br>
references are heavily used).</div>
</blockquote>
<div><br>
Sure - but by implementing a CU liveness
window like that (keeping CUs live only so
long as they need to be rather than an
all-or-nothing approach) only especially
quirky inputs would hit the worst case
while the more normal inputs could perform
better.<br>
</div>
</div>
</div>
</blockquote>
<p>It is not clear what should be put in such CU
liveness window. If CU100 references CU1 - how
could we know that we need to put CU1 into CU
liveness window before we processed CU100?<br>
</p>
</div>
</blockquote>
<div>Fair point, not just forward references to worry
about but backward references too. I wonder how much
savings there is in the liveness analysis compared
to "keep one copy of everything, no matter whether
it's live or not", then it can be a pure forward
progress situation. (with the quirk that you might
emit a declaration for an entity once, then a
definition for it later - alternatively if a
declaration is seen it could be skipped under the
assumption that a definition will follow (& use
a forward ref fixup) - and if none is found, splat
some stub declarations into a trailing CU at the
end) <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> Moreover, llvm-dwarfutil would be
the tool producing <br>
exactly such situation. The resulting
file(produced by llvm-dwarfutil) would
contain a lot of <br>
inter-CU references. Probably, there is
no practical reasons to apply
llvm-dwarfutil to the same <br>
file twice but it would be a good test
for the tool.<br>
</div>
</blockquote>
<div><br>
It'd be a good stress test, but not
necessarily something that would need to
perform the best because it wouldn't be a
common use case.<br>
</div>
</div>
</div>
</blockquote>
<p>I agree that we should not slow down the
DWARFLinker in common cases only because we need
to support the worst cases.<br>
But we also need to implement a solution which
works in some acceptable manner for the worst
case. </p>
</div>
</blockquote>
<div>I think that depends on "acceptable" - correct,
yes. Practical to run in reasonable time/memory? Not
necessarily, in my opinion. <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>The current solution - loading everything in
memory - makes it hard to use in a non-dsymutil
scenario(llvm-dwarfutil).<br>
</p>
</div>
</blockquote>
<div>I agree it's worth exploring the non-dsymutil
scenario, as you are - I'm just saying we don't
necessarily need to support high usability (fast/low
memory usage/etc) llvm-dwarfutil on an already
dwarfutil'd binary (but as you've pointed out, the
"window" is unknowable because of backward
references, so this whole subthread is perhaps
irrelevant).<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>There could be several things which could be
used to decide whether we need to go on a light
or heavy path:<br>
<br>
1. If the input contains only a single CU we do
not need to unload it from memory. Thus - we
would not need to do an extra DWARF loading
pass.<br>
2. If abbreviations from the whole input file do
not contain inter-CU references then while doing
liveness analysis, we do not need to wait until
other CUs are processed.<br>
</p>
</div>
</blockquote>
<div>(2) Yeah, that /may/ be a good idea, cheap to
test, etc. Though I'd still wonder if a more general
implementation strategy could be found that would
make it easier to get a sliding scale of efficiency
depending on how much inter-CU references where
were, not a "if there are none it's good, if there
are any it's bad or otherwise very different to
implement". <br>
</div>
</div>
</div>
</blockquote>
<p>I think, there is a scenario which would make it possible
to process CU once for not referenced CUs and handle
inter-CU references in a scalable way(even for dwarfutil`d
binary):<br>
<br>
1. Implement a global type's table and types merging. This
allows us to have all types in the memory. <br>
Then, all inter-CU type references would point into
that memory type table. <br>
(we do not know which CU should be put into CU liveness
window, we also could not put all CUs into the memory, but
we could put all types into the memory).<br>
<br>
2. If there are not other inter-CU references then all CUs
would be handled by one pass.<br>
<br>
3. If there are other inter-CU references, then after all
CU processed by the first pass we would have a list of
referenced CUs. Then, we could delete already cloned
data(for referenced CU) and start the process again: <br>
load CU, mark liveness, clone data. This second pass
would be done for only referenced CUs. <br>
For not-complex, not closely coupled cases it would
work relatively fast.<br>
<br>
4. put memory type table into artificial CU. Update all
type`s references.<br>
</p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
Then that scheme would be used for worst cases:<br>
<br>
1. for (CU : CU1...CU100) {<br>
load CU.<br>
analyse CU.<br>
unload CU.<br>
} <br>
2. for (CU : CU1...CU100) {<br>
load CU.<br>
clone CU.<br>
unload CU.<br>
} <br>
3. fixup forward references.<br>
<br>
and that scheme for light cases:<br>
<br>
1. for (CU : CU1...CU100) {<br>
load CU.<br>
analyse CU.<br>
clone CU.<br>
unload CU.<br>
}<br>
2. fixup forward references.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>Generally, I think we should not
assume that inter-CU references would be
used in a limited way.<br>
<br>
Anyway, if this scheme: <br>
<br>
1. analyse all CUs.<br>
2. clone all CUs.<br>
<p>would work slow then we would need to
continue with one-pass solution and
not support complex closely coupled
inputs.<br>
</p>
</div>
</blockquote>
<div><br>
</div>
<div>yeah, certainly seeing the
data/experiments will be interesting, if
you end up implementing some different
strategies, etc.<br>
<br>
I guess one possibility for parallel
generation could be something more like
Microsoft's approach with a central debug
info server that compilers communicate
with - not that exact model, I mean, but
if you've got parallel threads generating
reduced DWARF into separate object files -
they could communicate with a single
thread responsible for type emission - the
type emitter would be given types from the
separate threads and compute their size,
queue them up to be streamed out to the
type CU (& keep the source CU alive
until that work was done) - such a central
type emitter could quickly determine the
size of the type to be emitted and compute
future type offsets (eg: if 5 types were
in the queue, it could've figured out the
offset of those types already) to answer
type offset queries quickly and unblock
the parallel threads to continue emitting
their CUs containing type references.<br>
</div>
</div>
</div>
</blockquote>
<p>yes. Thank you. Would think about it.<br>
</p>
<p>Alexey.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
- Dave </div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div></div>