<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 27.10.2020 20:32, David Blaikie
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAENS6EuhUXULaY8hmU=RbgNmbJ6=ZPKv939SHdtGFyJmZNyA1A@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Oct 27, 2020 at 1:23
AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com"
moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 26.10.2020 22:38, David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, Oct 25,
2020 at 9:31 AM Alexey Lapshin <<a
href="mailto:avl.lapshin@gmail.com"
target="_blank" moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 23.10.2020 19:43, David Blaikie wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div><br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>Ah, yeah - that seems like a
missed opportunity - duplicating
the whole type DIE. LTO does
this by making monolithic types
- merging all the members from
different definitions of the
same type into one, but that's
maybe too expensive for dsymutil
(might still be interesting to
know how much more expensive,
etc). But I think the other way
to go would be to produce a
declaration of the type, with
the relevant members - and let
the DWARF consumer identify this
declaration as matching up with
the earlier definition. That's
the sort of DWARF you get from
the non-MachO default
-fno-standalone-debug anyway, so
it's already pretty well
tested/supported (support in
lldb's a bit younger/more
work-in-progress, admittedly). I
wonder how much dsym size there
is that could be reduced by such
an implementation.</div>
</div>
</div>
</blockquote>
<p>I see. Yes, that could be done and I
think it would result in noticeable
size reduction(I do not know exact
numbers at the moment).</p>
<p>I work on multi-thread DWARFLinker
now and it`s first version will do
exactly the same type processing like
current dsymutil.</p>
</blockquote>
<div>Yeah, best to keep the behavior the
same through that</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Above scheme could be implemented
as a next step and it would result
in better size reduction(better than
current state).</p>
<p>But I think the better scheme could
be done also and it would result in
even bigger size reduction and in
faster execution. This scheme is
something similar to what you`ve
described above: "LTO does - making
monolithic types - merging all the
members from different definitions
of the same type into one".</p>
</div>
</blockquote>
<div>I believe the reason that's probably
not been done is that it can't be
streamed - it'd lead to buffering more
of the output </div>
</div>
</div>
</blockquote>
<p>yes. The fact that DWARF should be streamed
into AsmPrinter complicates parallel dwarf
generation. In my prototype, I generate <br>
several resulting files(each for one source
compilation unit) and then sequentially glue
them into the final resulting file.<br>
</p>
</div>
</blockquote>
<div>How does that help? Do you use relocations in
those intermediate object files so the DWARF in
them can refer across files? <br>
</div>
</div>
</div>
</blockquote>
<p>It does not help with referring across the file. It
helps to parallel the generation of CU bodies. <br>
It is not possible to write two CUs in parallel into
AsmPrinter. To make possible parallel generation I
stream them into different AsmPrinters(this comment is
for "I believe the reason that's probably not been done
is that it can't be streamed". which initially was about
referring across the file, but it seems I added another
direction).<br>
</p>
</div>
</blockquote>
<div>Oh, I see - thanks for explaining, essentially buffering
on-disk. <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>(if two of these expandable types
were in one CU - the start of the second
type couldn't be known until the end
because it might keep getting pushed
later due to expansion of the first
type) and/or having to revisit all the
type references (the offset to the
second type wouldn't be known until the
end - so writing the offsets to refer to
the type would have to be deferred until
then).<br>
</div>
</div>
</div>
</blockquote>
<p>That is the second problem: offsets are not
known until the end of file.<br>
dsymutil already has that situation for
inter-CU references, so it has extra pass to<br>
fixup offsets. </p>
</div>
</blockquote>
<div>Oh, it does? I figured it was one-pass, and
that it only ever refers back to types in previous
CUs? So it doesn't have to go back and do a second
pass. But I guess if sees a declaration of T1 in
CU1, then later on sees a definition of T1 in CU2,
does it somehow go back to CU1 and remove the
declaration/make references refer to the
definition in CU2? I figured it'd just leave the
declaration and references to it as-is, then add
the definition and use that from CU2 onwards? <br>
</div>
</div>
</div>
</blockquote>
<p>For the processing of the types, it do not go back. <br>
This "I figured it was one-pass, and that it only ever
refers back to types in previous CUs" <br>
and this "I figured it'd just leave the declaration and
references to it as-is, then add the definition and use
that from CU2 onwards" are correct. <br>
</p>
</div>
</blockquote>
<div>Great - thanks for explaining/confirming! </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>With multi-thread implementation such
situation would arise more often <br>
for type references and so more offsets should
be fixed during additional pass.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>DWARFLinker could create additional
artificial compile unit and put all
merged types there. Later patch all
type references to point into this
additional compilation unit. No any
bits would be duplicated in that
case. The performance improvement
could be achieved due to less amount
of the copied DWARF and due to the
fact that type references could be
updated when DWARF is cloned(no need
in additional pass for that).<br>
</p>
</div>
</blockquote>
<div>"later patch all type references to
point into this additional compilation
unit" - that's the additional pass that
people are probably talking/concerned
about. Rewalking all the DWARF. The
current dsymutil approach, as far as I
know, is single pass - it knows the
final, absolute offset to the type from
the moment it emits that type/needs to
refer to it. <br>
</div>
</div>
</div>
</blockquote>
<p>Right. Current dsymutil approach is single
pass. And from that point of view, solution <br>
which you`ve described(to produce a
declaration of the type, with the relevant
members) <br>
allows to keep that single pass
implementation.<br>
<br>
But there is a restriction for current
dsymutil approach: To process inter-CU
references <br>
it needs to load all DWARF into the
memory(While it analyzes which part of DWARF
is live, <br>
it needs to have all CUs loaded into the
memory).</p>
</div>
</blockquote>
<div>All DWARF for a single file (which for dsymutil
is mostly a single CU, except with LTO I guess?),
not all DWARF for all inputs in memory at once,
yeah? <br>
</div>
</div>
</div>
</blockquote>
<p>right. In dsymutil case - all DWARF for a single
file(not all DWARF for all inputs in memory at once).<br>
But in llvm-dwarfutil case single file contains DWARF
for all original input object files and it all becomes<br>
loaded into memory.<br>
</p>
</div>
</blockquote>
<div>Yeha, would be great to try to go CU-by-CU. </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>That leads to huge memory usage. <br>
It is less important when source is a set of
object files(like in dsymutil case) and this <br>
become a real problem for llvm-dwarfutil
utility when source is a single file(With
current <br>
implementation it needs 30G of memory for
compiling clang binary).<br>
</p>
</div>
</blockquote>
<div>Yeah, that's where I think you'd need a fixup
pass one way or another - because cross-CU
references can mean that when you figure out a new
layout for CU5 (because it has a duplicate type
definition of something in CU1) then you might
have to touch CU4 that had an absolute/cross-CU
forward reference to CU5. Once you've got such a
fixup pass (if dsymutil already has one? Which,
like I said, I'm confused why it would have
one/that doesn't match my very vague
understanding) then I think you could make
dsymutil work on a per-CU basis streaming things
out, then fixing up a few offsets.<br>
</div>
</div>
</div>
</blockquote>
<p>When dsymutil deduplicates types it changes local CU
reference into inter-CU reference(so that CU2(next)
could reference type definition from CU1(prev)). To do
this change it does not need to do any fixups currently.<br>
<br>
When dsymutil meets already existed(located in the input
object file) inter-CU reference pointing into the CU
which has not been processed yet(and then its offset is
unknown) it marks it as "forward reference" and patches
later during additional pass "fixup forward references"
at a time when offsets are known. <br>
</p>
</div>
</blockquote>
<div>OK, so limited 2 pass system. (does it do that second
pass once at the end of the whole dsymutil run, or at the
end of each input file? (so if an input file has two CUs and
the first CU references a type in the second CU - it could
write the first CU with a "forward reference", then write
the second CU, then fixup the forward reference - and then
go on to the next file and its CUs - this could improve
performance by touching recently used memory/disk pages
only, rather than going all the way back to the start later
on when those pages have become cold)</div>
</div>
</div>
</blockquote>
<p>yes, It does it in the end of each input file.</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAENS6EuhUXULaY8hmU=RbgNmbJ6=ZPKv939SHdtGFyJmZNyA1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
If CUs would be processed in parallel their offsets
would not be known at the moment when local type
reference would be changed into inter-CU reference. So
we would need to do the same fix-up processing for all
references to the types like we already do for other
inter-CU references.<br>
</p>
</div>
</blockquote>
<div>Yeah - though the existence of this second "fixup forward
references" system - yeah, could just use it much more
generally as you say. Not an extra pass, just the existing
second pass but having way more fixups to fixup in that
pass.</div>
</div>
</div>
</blockquote>
If we would be able to change the algorithm in such way : <br>
<br>
1. analyse all CUs.<br>
2. clone all CUs.<br>
<br>
Then we could create a merged type table(artificial CU containing
types) during step1. <br>
If that type table would be written first, then all following CUs
could use known offsets <br>
to the types and we would not need additional fix-up processing for
type references. <br>
It would still be necessary to fix-up other inter-CU references. But
it would not be necessary <br>
to fix-up type references (which constitute the vast majority).<br>
<br>
<blockquote type="cite"
cite="mid:CAENS6EuhUXULaY8hmU=RbgNmbJ6=ZPKv939SHdtGFyJmZNyA1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> <br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Without loading all CU into the memory it
would require two passes solution. First to
analyze <br>
which part of DWARF relates to live code and
then second pass to generate the result. <br>
</p>
</div>
</blockquote>
<div>Not sure it'd require any more second pass than
a "fixup" pass, which it sounds like you're saying
it already has? <br>
</div>
</div>
</div>
</blockquote>
<p>It looks like it would need an additional pass to
process inter-CU references(existed in incoming file) if
we do not want to load all CUs into memory.<br>
</p>
</div>
</blockquote>
<div>Usually inter-CU references aren't used, except in LTO -
and in LTO all the DWARF deduplication and function
discarding is already done by the IR linker anyway. (ThinLTO
is a bit different, but really we'd be better off teaching
it the extra tricks anyway (some can't be fixed in ThinLTO -
like emitting a "Home" definition of an inline function,
only to find out other ThinLTO backend/shards managed to
optimize away all uses of the function... so some cleanup
may be useful there)). It might be possible to do a more
dynamic/rolling cache - keep only the CUs with unresolved
cross-CU references alive and only keep them alive until
their cross-CU references are found/marked alive. This
should make things no worse than the traditional dsymutil
case - since cross-CU references are only
effective/generally used within a single object file (it's
possible to create relocations for them into other files -
but I know LLVM doesn't currently do this and I don't think
GCC does it) with multiple CUs anyway - so at most you'd
keep all the CUs from a single original input file alive
together.<br>
</div>
</div>
</div>
</blockquote>
But, since it is a DWARF documented case the tool should be ready
for such case(when inter-CU <br>
references are heavily used). Moreover, llvm-dwarfutil would be the
tool producing <br>
exactly such situation. The resulting file(produced by
llvm-dwarfutil) would contain a lot of <br>
inter-CU references. Probably, there is no practical reasons to
apply llvm-dwarfutil to the same <br>
file twice but it would be a good test for the tool.<br>
<br>
Generally, I think we should not assume that inter-CU references
would be used in a limited way.<br>
<br>
Anyway, if this scheme: <br>
<br>
1. analyse all CUs.<br>
2. clone all CUs.<br>
<p>would work slow then we would need to continue with one-pass
solution and not support complex closely coupled inputs.<br>
</p>
<p>Alexey.<br>
</p>
<blockquote type="cite"
cite="mid:CAENS6EuhUXULaY8hmU=RbgNmbJ6=ZPKv939SHdtGFyJmZNyA1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> When the input file contains inter-CU references,
DWARFLinker needs to follow them while doing liveness
marking. i.e. if the original CU has a live part which
references another CU we need to follow this new CU and
mark the referenced part as life. At the current moment,
while doing liveness analysis, we have all CUs in
memory. That allows us to load all CUs once and analyze
them all. In case llvm-dwarfutil(which loads all DWARF
for input file) it leads to huge memory usage. <br>
<br>
Let's say CU1 references CU100. And CU100 references
CU1. We could not start generation for CU1 until we
analyzed CU100 and marked the corresponding part of CU1
as life. At the same time, we could not load DWARF for
all CUs. Then processing(in simplified form) could look
like this:<br>
<br>
1: for (CU : CU1...CU100)<br>
load CU, do liveness analysis, remember references,
unload CU<br>
<br>
2: for (all references)<br>
load CU, do liveness analysis, unload CU<br>
<br>
3: for (CU : CU1...CU100)<br>
load CU, clone CU<br>
<br>
That is a simplified scheme, but I think it is enough to
show the idea. In this scheme we have 1 and 2 which
should be done before 3. <br>
</p>
<p><br>
</p>
<p>Alexey.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> If we would have a two passes solution then
we could create a compilation unit with all <br>
types at first pass and at the second pass we
could generate result with correct offsets(no
<br>
need to fix up them as it is currently
required by dsymutil for forward inter-CU
references).<br>
The open question currently: how expensive
this two passes approach is.<br>
</p>
<p>Thank you, Alexey.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>Anyway, that might be the next step
after multi-thread DWARFLinker would
be ready.<br>
</p>
</div>
</blockquote>
<div>Yep, be interesting to see how it all
goes! </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> <br>
Do you suggest that
0x0000011b should be
transformed into something
like that:<br>
<br>
0x000000fc:
DW_TAG_compile_unit<br>
DW_AT_language
(DW_LANG_C_plus_plus)<br>
DW_AT_name
("templ.cpp")<br>
DW_AT_stmt_list
(0x00000090)<br>
DW_AT_low_pc
(0x0000000100000fa0)<br>
DW_AT_high_pc
(0x0000000100000fab)<br>
<br>
0x0000011b:
DW_TAG_structure_type<br>
DW_AT_specification
(0x0000002a "x")<br>
<br>
0x00000124:
DW_TAG_subprogram<br>
DW_AT_linkage_name
("_ZN1x2f3IiEEiv")<br>
DW_AT_name
("f3<int>")<br>
DW_AT_type
(0x000000000000005e "int")<br>
DW_AT_declaration (true)<br>
DW_AT_external (true)<br>
DW_AT_APPLE_optimized (true)<br>
0x00000138: NULL<br>
0x00000139: NULL<br>
<br>
0x00000140:
DW_TAG_subprogram<br>
DW_AT_low_pc
(0x0000000100000fa0)<br>
DW_AT_high_pc
(0x0000000100000fab)<br>
DW_AT_specification
(0x0000000000000124
"_ZN1x2f3IiEEiv")<br>
0x00000155: NULL<br>
<br>
Did I correctly get the
idea?<br>
</div>
</blockquote>
<div><br>
</div>
<div>Yep, more or less. It'd be
"safer" if 11b didn't use
DW_AT_specification to refer
to 2a, but instead was only a
completely independent
declaration of "x" - that path
is already well
supported/tested (well, it's
the work-in-progress stuff for
lldb to support
-fno-standalone-debug, but
gdb's been consuming DWARF
like this for years, Clang and
GCC both produce DWARF like
this (if the type is "homed"
in another file, then
Clang/GCC produce DWARF that
emits a declaration with just
the members needed to define
any member functions
defined/inlined/referenced in
this CU)) for years.<br>
<br>
But using DW_AT_specification,
or maybe some other extension
attribute might make the
consumers task a bit easier
(could do both - use an
extension attribute to tie
them up, leave
DW_AT_declaration/DW_AT_name
here for consumers that don't
understand the extension
attribute) in finding that
they're all the same
type/pieces of teh same type.</div>
<div> </div>
</div>
</div>
</blockquote>
<p>yes. would try this solution.</p>
<p>Thank you, Alexey.<br>
</p>
<br>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>