<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 13.10.2020 10:20, James Henderson
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CABqSp3=ObdqO+2jQpEMwnOL2O9nytD87q9WMDaWQaVwHA4_j_A@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>The script included in the patch can be used to convert an
object containing normal DWARF into an object using fragmented
DWARF. It does this by using llvm-dwarfdump to dump the
various sections, parses the output to identify where it
should split (using the offsets of the various entries), and
then writes new section headers accordingly - you can see
roughly what it's doing if you get a chance to watch the talk
recording. The additional section headers are appended to the
end of the ELF section header table, whilst the original DWARF
is left in the same place it was before (making use of the
fact that section headers don't have to appear in offset
order). The script also parses and fragments the relocation
sections targeting the DWARF sections so that they match up
with the fragmented DWARF sections. This is clearly all
suboptimal - in practice the compiler should be modified to do
the fragmenting upfront, to save having to parse a tool's
stdout, but that was just the simplest thing I could come up
with to quickly write the script. Full details of the script
usage are included in the patch description, if you want to
play around with it.</div>
<div><br>
</div>
<div>If Alexey could point me at the latest version of his
patch, I'd be happy to run that through either or both of the
packages I used to see what happens. Equally, I'd be happy if
Alexey is able to run my script to fragment and measure the
performance of a couple of projects he's been working with.
Based purely on the two packages I've tried this with, I can
tell already that the results can vary wildly. My expectation
is that Alexey's approach will be slower (at least in its
current form, but probably more generally), but produce
smaller output, but to what scale I have no idea.<br>
</div>
</div>
</blockquote>
<p>My patch is at <a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D74169">https://reviews.llvm.org/D74169</a>. But I think it
needs rebasing. I will rebase/update it in a couple of days. <br>
</p>
<p>I also would examine "Fragmented DWARF" patch this week to see
the results(James, thank you for the sharing this work!). To
compare apples to apples, I guess, D74169 approach should be used
with ODR types de-duplication switched OFF. I would add that
option into D74169(Though in that case it would be even more
slower, but the resulting binary size should be closer to
"Fragmented DWARF" results then).</p>
<p>Thank you, Alexey.<br>
</p>
<blockquote type="cite"
cite="mid:CABqSp3=ObdqO+2jQpEMwnOL2O9nytD87q9WMDaWQaVwHA4_j_A@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
<div>I think linkers parse .eh_frame partly because they have no
other choice. That being said, I think it's format is not too
complex, so similarly the parser isn't too complex. You can
see LLD's ELF implementation in ELF/EhFrame.cpp, how it is
used in ELF/InputSection.cpp (see the bits to do with
EhInputSection) and EhFrameSection in ELF/SyntheticSections.h
(plus various usages of these two throughout the LLD code). I
think the key to any structural changes in the DWARF format to
make them more amenable to link-time parsing is being able to
read a minimal amount without needing to parse the payload
(e.g. a length field, some sort of type, and then using the
relocations to associate it accordingly).</div>
<div><br>
</div>
<div>James<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 12 Oct 2020 at 20:48,
David Blaikie <<a href="mailto:dblaikie@gmail.com"
moz-do-not-send="true">dblaikie@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Awesome! Sorry I missed the lightning talk, but
really interested to see this sort of thing (though it's not
directly/immediately applicable to the use case I work with
- Split DWARF, something similar could be used there with
further work)<br>
<br>
Though it looks like the patch has mostly linker changes -
where/how do you generate the fragmented DWARF to begin
with? Via the Python script? Run over assembly? I'd be
surprised if it was achievable that way - curious to know
more.<br>
<br>
Got a rough sense/are you able to run apples-to-apples
comparisons with Alexey's linker-based patches to compare
linker time/memory overhead versus resulting output size
gains?<br>
<br>
(& yeah, I'm a bit curious about how the linkers do
eh_frame rewriting, if the format is especially amenable to
a lightweight parsing/rewriting and how we could make the
DWARF more amenable to that too)</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Oct 12, 2020 at
6:41 AM James Henderson <<a
href="mailto:jh7370.2008@my.bristol.ac.uk"
target="_blank" moz-do-not-send="true">jh7370.2008@my.bristol.ac.uk</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hi all,</div>
<div><br>
</div>
<div>At the recent LLVM developers' meeting, I presented
a lightning talk on an approach to reduce the amount
of dead debug data left in an executable following
operations such as --gc-sections and duplicate COMDAT
removal. In that presentation, I presented some
figures based on linking a game that had been built by
our downstream clang port and fragmented using the
described approach. Since recording the presentation,
I ran the same experiment on a clang package (this
time built with a GCC version). The comparable figures
are below:</div>
<div><br>
</div>
<div>Link-time speed (s):</div>
<div><span style="font-family:monospace">+--------------------+-------+---------------+------+------+------+------+------+</span><br>
</div>
<div><font face="monospace">| Package variant | No GC
| GC 1 (normal) | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |</font></div>
<div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+<br>
</font></div>
<div><font face="monospace">| Game (plain) | 4.5
| 4.9 | 4.2 | 3.6 | 3.4 | 3.3 | 3.2 |<br>
</font></div>
<div><font face="monospace">| Game (fragmented) | 11.1
| 11.8 | 9.7 | 8.6 | 7.9 | 7.7 | 7.5 |<br>
</font></div>
<div><font face="monospace">| Clang (plain) | 13.9
| 17.9 | 17.0 | 16.7 | 16.3 | 16.2 | 16.1 |<br>
</font></div>
<div><font face="monospace">| Clang (fragmented) | 18.6
| 22.8 | 21.6 | 21.1 | 20.8 | 20.5 | 20.2 |</font></div>
<div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Output size - Game package
(MB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
</font></font></div>
<div><span style="font-family:monospace">|
Category | No GC | GC 1 | GC 2 | GC 3 |
GC 4 | GC 5 | GC 6 |<br>
</span></div>
<div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(total) | 1149 | 1121 | 1017 | 965 | 938 |
930 | 928 |<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(DWARF*) | 845 | 845 | 845 | 845 | 845 |
845 | 845 |<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(other) | 304 | 276 | 172 | 120 | 93
| 85 | 82 |<br>
</span></div>
<div><span style="font-family:monospace">| Fragmented
(total) | 1044 | 940 | 556 | 373 | 287 | 263
| 255 |<br>
</span></div>
<div><span style="font-family:monospace">| Fragmented
(DWARF*) | 740 | 664 | 384 | 253 | 194 | 178
| 173 |<br>
</span></div>
<div><span style="font-family:monospace">| Fragmented
(other) | 304 | 276 | 172 | 120 | 93 | 85
| 82 |<br>
</span></div>
<div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+
</font>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Output size - Clang
(MB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
</font></font></div>
<div><span style="font-family:monospace">|
Category | No GC | GC 1 | GC 2 | GC 3 |
GC 4 | GC 5 | GC 6 |<br>
</span></div>
<div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(total) | 2596 | 2546 | 2406 | 2332 | 2293
| 2273 | 2251 |<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(DWARF*) | 1979 | 1979 | 1979 | 1979 | 1979
| 1979 | 1979 |<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(other) | 616 | 567 | 426 | 353 | 314
| 294 | 272 |<br>
</span></div>
<div><span style="font-family:monospace">| Fragmented
(total) | 2397 | 2346 | 2164 | 2069 | 2017 |
1990 | 1963 |<br>
</span></div>
<div><span style="font-family:monospace">| Fragmented
(DWARF*) | 1780 | 1780 | 1738 | 1716 | 1703 |
1696 | 1691 |<br>
</span></div>
<div><span style="font-family:monospace">| Fragmented
(other) | 616 | 567 | 426 | 353 | 314 |
294 | 272 |<br>
</span></div>
<div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+</font></div>
</div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace">*DWARF size == total size of
.debug_info + .debug_line + .debug_ranges +
.debug_aranges + .debug_loc<br>
</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Additionally, I have
posted <a href="https://reviews.llvm.org/D89229"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D89229</a>
which provides the python script and linker
patches used to reproduce the above results on my
machine. The GC 1/2/3/4/5/6 correspond to the
linker option added in that patch --mark-live-pc
with values 1/0.8/0.6/0.4/0.2/0 respectively.<br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif">During the conference, the
question was asked what the memory usage and input
size impact was. I've summarised these below:</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Input file size total
(GB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif">
<span style="font-family:monospace">+--------------------+------------+
</span></font></font></div>
<div><span style="font-family:monospace">
| Package variant | Total Size | <br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+------------+<br>
</span></div>
<div><span style="font-family:monospace">
| Game (plain) | 2.9 | <br>
</span></div>
<div><span style="font-family:monospace">
| Game (fragmented) | 4.2 |<br>
</span></div>
<div><span style="font-family:monospace">
| Clang (plain) | 10.9 |<br>
</span></div>
<div><span style="font-family:monospace">
| Clang (fragmented) | 12.3 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+------------+</span></div>
<div><span style="font-family:monospace"><br>
</span></div>
<div><span style="font-family:monospace"><span
style="font-family:arial,sans-serif">Peak Working
Set Memory usage (GB):</span><br>
</span></div>
<div><span style="font-family:monospace">
</span>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+--------------------+-------+------+
</span></font></font></div>
<div><span style="font-family:monospace">
| Package variant | No GC | GC 1 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+-------+------+<br>
</span></div>
<div><span style="font-family:monospace">
| Game (plain) | 4.3 | 4.7 |<br>
</span></div>
<div><span style="font-family:monospace">
| Game (fragmented) | 8.9 | 8.6 |<br>
</span></div>
<div><span style="font-family:monospace">
| Clang (plain) | 15.7 | 15.6 |<br>
</span></div>
<div><span style="font-family:monospace">
| Clang (fragmented) | 19.4 | 19.2 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+-------+------+</span></div>
<div><span style="font-family:monospace"><br>
</span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">I'm keen to hear what
people's feedback is, and also interested to see
what results others might see by running this
experiment on other input packages. Also, if
anybody has any alternative ideas that meet the
goals listed below, I'd love to hear them!<br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif"><br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">To reiterate some key
goals of fragmented DWARF, similar to what I
said in the presentation:</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">1) Devise a scheme that
gives significant size savings without being too
costly. It's clear from just the two packages
I've tried this on that there is a fairly hefty
link time performance cost, although the exact
cost depends on the nature of the input package.
On the other hand, depending on the nature of
the input package, there can also be some big
gains.<br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">2) Devise a scheme that
doesn't require any linker knowledge of DWARF.
The current approach doesn't quite achieve this
properly due to the slight misuse of
SHF_LINK_ORDER, but I expect that a pivot to
using non-COMDAT group sections should solve
this problem.</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">3) Provide some kind of
halfway house between simply writing tombstone
values into dead DWARF and fully parsing the
DWARF to reoptimise its/discard the dead bits.<br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif"><br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">I'm hopeful that changes
could be made to the linker to improve the
link-time cost. There seems to be a significant
amount of the link time spent creating the input
sections. An alternative would be to devise a
scheme that would avoid the literal splitting
into section headers, in favour of some sort of
list of split-points that the linker uses to
split things up (a bit like it already does for
.eh_frame or mergeable sections).</font><br>
</span></div>
<span style="font-family:monospace">
</span></div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>