<div dir="ltr"><div>Hi Alexey,</div><div><br></div><div>Thanks for taking a look at these. I noticed you set the --mark-live-pc value to a value other than 1 for the fragmented DWARF version. This will mean additional GC-ing will be done beyond the amount that --gc-sections will do, so unless you use the same value for the option for other versions, the result will not be comparable. (The option is purely there to experiment with the effects were different amounts of the input codebase to be considered dead). Would you be okay to run those figures again without the option specified?</div><div><br></div><div>I'm still trying to figure out the problems on my end to try running your experiment on the game package I used in my presentation, but have been interrupted by other unrelated issues. I'll try to get back to this in the coming days.</div><div><br></div><div>James<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 4 Nov 2020 at 11:54, Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi James,<br>
<br>
I did experiments with the clang code base and will do experiments
with our local codebase later. <br>
Overall, both solutions("Fragmented DWARF" and "DWARFLinker
without odr types deduplication") look having similar size savings
results for the final binary. "DWARFLinker with odr types
deduplication" has a bigger size saving effect. "Fragmented DWARF"
increases the size of original object files up to 15%.<br>
LLD with "fragmented DWARF" works significantly faster than with
"DWARFLinker".<br>
<br>
Following are the results for "llvm-strings" and "clang" binaries:<br>
<br>
1. llvm-strings:<br>
<br>
<tt> source object files size: 381M.</tt><tt><br>
</tt><tt> fragmented source object files size: 451M(18%
increase).</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> a. upstream version, </tt><tt><br>
</tt><tt> command line options: --gc-sections</tt><tt><br>
</tt><tt> binary size: 6,5M</tt><tt><br>
</tt><tt> compilation time: 0:00.13 sec</tt><tt><br>
</tt><tt> run-time memory: 111kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> b. "fragmented DWARF" version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--mark-live-pc=0.45</tt><tt><br>
</tt><tt> binary size: 3,7M</tt><tt><br>
</tt><tt> compilation time: 0:00.10 sec</tt><tt><br>
</tt><tt> run-time memory: 122kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> c. DWARFLinker version, </tt><tt><br>
</tt><tt> command line options: --gc-sections --gc-debuginfo</tt><tt><br>
</tt><tt> binary size: 3,8M</tt><tt><br>
</tt><tt> compilation time: 0:00.33 sec</tt><tt><br>
</tt><tt> run-time memory: 141kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> d. DWARFLinker no-odr version, </tt><tt><br>
</tt><tt> command line options: --gc-sections --gc-debuginfo
--gc-debuginfo-no-odr</tt><tt><br>
</tt><tt> binary size: 4,3M</tt><tt><br>
</tt><tt> compilation time: 0:00.38 sec</tt><tt><br>
</tt><tt> run-time memory: 142kb</tt><br>
<br>
<br>
2. clang:<br>
<br>
<tt> source object files size: 6,5G.</tt><tt><br>
</tt><tt> fragmented source object files size: 7,3G(13%
increase).</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> a. upstream version, </tt><tt><br>
</tt><tt> command line options: --gc-sections</tt><tt><br>
</tt><tt> binary size: 1,5G</tt><tt><br>
</tt><tt> compilation time: 6 sec </tt><tt><br>
</tt><tt> run-time memory: 9.7G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> b. "fragmented DWARF" version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--mark-live-pc=0.43</tt><tt><br>
</tt><tt> binary size: 1,1G</tt><tt><br>
</tt><tt> compilation time: 9 sec</tt><tt><br>
</tt><tt> run-time memory: 11G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> c. DWARFLinker version, </tt><tt><br>
</tt><tt> command line options: --gc-sections --gc-debuginfo</tt><tt><br>
</tt><tt> binary size: 836M</tt><tt><br>
</tt><tt> compilation time: 62 sec</tt><tt><br>
</tt><tt> run-time memory: 15G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> d. DWARFLinker no-odr version, </tt><tt><br>
</tt><tt> command line options: --gc-sections --gc-debuginfo
--gc-debuginfo-no-odr</tt><tt><br>
</tt><tt> binary size: 1,3G</tt><tt><br>
</tt><tt> compilation time: 128 sec</tt><tt><br>
</tt><tt> run-time memory: 17G</tt><br>
<br>
Detailed size results:<br>
<br>
<tt>1. llvm-strings <br>
</tt></p>
<p><tt> a)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 41.1% 2.64Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 24.9% 1.60Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 12.6% 827Ki 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 6.5% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 4.8% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 3.4% 223Ki 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 2.0% 133Ki 19.8% 133Ki .eh_frame</tt><tt><br>
</tt><tt> 1.7% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.2% 77.6Ki 0.0% 0 .debug_abbrev</tt><tt><br>
</tt><tt><br>
</tt><tt> b)</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 50.3% 1.85Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 43.6% 1.60Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 2.6% 98.2Ki 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 2.1% 77.6Ki 0.0% 0 .debug_abbrev</tt><tt><br>
</tt><tt> 0.5% 17.5Ki 54.9% 17.4Ki .text</tt><tt><br>
</tt><tt> 0.3% 9.94Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 0.2% 6.27Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 0.1% 5.09Ki 15.9% 5.03Ki .eh_frame</tt><tt><br>
</tt><tt> 0.1% 3.28Ki 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt><br>
</tt><tt> c)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 33.0% 1.25Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 29.2% 1.11Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 11.0% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 8.2% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 7.8% 304Ki 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 3.4% 133Ki 19.8% 133Ki .eh_frame</tt><tt><br>
</tt><tt> 2.8% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.7% 65.9Ki 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 1.0% 38.4Ki 5.7% 38.4Ki .rodata</tt><tt><br>
</tt><tt><br>
</tt><tt> d)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 39.7% 1.68Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 26.3% 1.11Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 9.9% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 7.3% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 7.0% 304Ki 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 3.1% 133Ki 19.8% 133Ki .eh_frame</tt><tt><br>
</tt><tt> 2.6% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.5% 65.9Ki 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>2. clang</tt></p>
<p><tt> a)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 58.3% 878Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 11.8% 177Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 7.7% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 7.7% 115Mi 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 6.0% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 2.4% 35.4Mi 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame</tt><tt><br>
</tt><tt> 1.5% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 1.2% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt><br>
</tt><tt> b)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 71.5% 772Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 16.5% 177Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 3.7% 40.2Mi 59.2% 40.2Mi .text</tt><tt><br>
</tt><tt> 2.4% 25.8Mi 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 2.1% 23.0Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 1.0% 10.6Mi 15.6% 10.6Mi .dynstr</tt><tt><br>
</tt><tt> 0.7% 7.18Mi 10.6% 7.18Mi .eh_frame</tt><tt><br>
</tt><tt> 0.5% 5.60Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 0.4% 4.28Mi 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 0.4% 4.04Mi 0.0% 0 .debug_abbrev</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt> c)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 35.1% 293Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 21.2% 177Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 13.9% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 10.9% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 6.9% 57.4Mi 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame</tt><tt><br>
</tt><tt> 2.8% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 2.1% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.5% 12.4Mi 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr</tt><tt><br>
</tt><tt><br>
</tt><tt> d)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 58.3% 758Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 13.6% 177Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 8.9% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 7.0% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 4.4% 57.4Mi 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame</tt><tt><br>
</tt><tt> 1.8% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 1.4% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.0% 12.4Mi 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr</tt></p>
<p><tt>Thank you, Alexey.</tt><br>
</p>
<div>On 19.10.2020 11:50, James Henderson
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Great, thanks Alexey! I'll try to take a look at
this in the near future, and will report my results back here. I
imagine our clang results will differ, purely because we
probably used different toolchains to build the input in the
first place.<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, 15 Oct 2020 at 10:08,
Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 13.10.2020 10:20, James Henderson wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>The script included in the patch can be used to
convert an object containing normal DWARF into an
object using fragmented DWARF. It does this by using
llvm-dwarfdump to dump the various sections, parses
the output to identify where it should split (using
the offsets of the various entries), and then writes
new section headers accordingly - you can see roughly
what it's doing if you get a chance to watch the talk
recording. The additional section headers are appended
to the end of the ELF section header table, whilst the
original DWARF is left in the same place it was before
(making use of the fact that section headers don't
have to appear in offset order). The script also
parses and fragments the relocation sections targeting
the DWARF sections so that they match up with the
fragmented DWARF sections. This is clearly all
suboptimal - in practice the compiler should be
modified to do the fragmenting upfront, to save having
to parse a tool's stdout, but that was just the
simplest thing I could come up with to quickly write
the script. Full details of the script usage are
included in the patch description, if you want to play
around with it.</div>
<div><br>
</div>
<div>If Alexey could point me at the latest version of
his patch, I'd be happy to run that through either or
both of the packages I used to see what happens.
Equally, I'd be happy if Alexey is able to run my
script to fragment and measure the performance of a
couple of projects he's been working with. Based
purely on the two packages I've tried this with, I can
tell already that the results can vary wildly. My
expectation is that Alexey's approach will be slower
(at least in its current form, but probably more
generally), but produce smaller output, but to what
scale I have no idea.<br>
</div>
</div>
</blockquote>
<p>James, I updated the patch - <a href="https://reviews.llvm.org/D74169" target="_blank">https://reviews.llvm.org/D74169</a>.</p>
<p>To make it working it is necessary to build example with
-ffunction-sections and specify following options to the
linker :</p>
<p>--gc-sections --gc-debuginfo --gc-debuginfo-no-odr</p>
<p>For clang binary I got following results:</p>
<p>1. --gc-sections = binary size 1,5G, Debug Info
size(*)1.2G</p>
<p>2. --gc-sections --gc-debuginfo = binary size 840M, 8x
performance decrease, Debug Info size 542M<br>
</p>
<p>3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr =
binary size 1,3G, 16x performance decrease, Debug Info
size 1G<br>
</p>
<p>(*)
.debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc<br>
</p>
<p><br>
</p>
<p>I added option --gc-debuginfo-no-odr, so that size
reduction could be compared correctly. Without that option
D74169 does types deduplication and then it is not correct
to compare resulting size with "Fragmented DWARF" solution
which does not do types deduplication.<br>
</p>
<p>Also, I look at your <font face="monospace"><font face="arial,sans-serif"><a href="https://reviews.llvm.org/D89229" target="_blank">D89229</a>
and would share results some time later.<br>
</font></font></p>
<p>Thank you, Alexey.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I think linkers parse .eh_frame partly because they
have no other choice. That being said, I think it's
format is not too complex, so similarly the parser
isn't too complex. You can see LLD's ELF
implementation in ELF/EhFrame.cpp, how it is used in
ELF/InputSection.cpp (see the bits to do with
EhInputSection) and EhFrameSection in
ELF/SyntheticSections.h (plus various usages of these
two throughout the LLD code). I think the key to any
structural changes in the DWARF format to make them
more amenable to link-time parsing is being able to
read a minimal amount without needing to parse the
payload (e.g. a length field, some sort of type, and
then using the relocations to associate it
accordingly).</div>
<div><br>
</div>
<div>James<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 12 Oct 2020 at
20:48, David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Awesome! Sorry I missed the lightning
talk, but really interested to see this sort of
thing (though it's not directly/immediately
applicable to the use case I work with - Split
DWARF, something similar could be used there with
further work)<br>
<br>
Though it looks like the patch has mostly linker
changes - where/how do you generate the fragmented
DWARF to begin with? Via the Python script? Run over
assembly? I'd be surprised if it was achievable that
way - curious to know more.<br>
<br>
Got a rough sense/are you able to run
apples-to-apples comparisons with Alexey's
linker-based patches to compare linker time/memory
overhead versus resulting output size gains?<br>
<br>
(& yeah, I'm a bit curious about how the linkers
do eh_frame rewriting, if the format is especially
amenable to a lightweight parsing/rewriting and how
we could make the DWARF more amenable to that too)</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Oct 12,
2020 at 6:41 AM James Henderson <<a href="mailto:jh7370.2008@my.bristol.ac.uk" target="_blank">jh7370.2008@my.bristol.ac.uk</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hi all,</div>
<div><br>
</div>
<div>At the recent LLVM developers' meeting, I
presented a lightning talk on an approach to
reduce the amount of dead debug data left in
an executable following operations such as
--gc-sections and duplicate COMDAT removal. In
that presentation, I presented some figures
based on linking a game that had been built by
our downstream clang port and fragmented using
the described approach. Since recording the
presentation, I ran the same experiment on a
clang package (this time built with a GCC
version). The comparable figures are below:</div>
<div><br>
</div>
<div>Link-time speed (s):</div>
<div><span style="font-family:monospace">+--------------------+-------+---------------+------+------+------+------+------+</span><br>
</div>
<div><font face="monospace">| Package variant
| No GC | GC 1 (normal) | GC 2 | GC 3 | GC 4
| GC 5 | GC 6 |</font></div>
<div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+<br>
</font></div>
<div><font face="monospace">| Game (plain)
| 4.5 | 4.9 | 4.2 | 3.6 | 3.4
| 3.3 | 3.2 |<br>
</font></div>
<div><font face="monospace">| Game (fragmented)
| 11.1 | 11.8 | 9.7 | 8.6 | 7.9
| 7.7 | 7.5 |<br>
</font></div>
<div><font face="monospace">| Clang (plain)
| 13.9 | 17.9 | 17.0 | 16.7 | 16.3
| 16.2 | 16.1 |<br>
</font></div>
<div><font face="monospace">| Clang (fragmented)
| 18.6 | 22.8 | 21.6 | 21.1 | 20.8
| 20.5 | 20.2 |</font></div>
<div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font face="arial,sans-serif">Output size - Game
package (MB):</font></font></div>
<div><font face="monospace"><font face="arial,sans-serif"><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
</font></font></div>
<div><span style="font-family:monospace">|
Category | No GC | GC 1 | GC 2 |
GC 3 | GC 4 | GC 5 | GC 6 |<br>
</span></div>
<div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(total) | 1149 | 1121 | 1017 | 965
| 938 | 930 | 928 |<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(DWARF*) | 845 | 845 | 845 | 845
| 845 | 845 | 845 |<br>
</span></div>
<div><span style="font-family:monospace">| Plain
(other) | 304 | 276 | 172 | 120
| 93 | 85 | 82 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (total) | 1044 | 940 | 556 |
373 | 287 | 263 | 255 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (DWARF*) | 740 | 664 | 384 |
253 | 194 | 178 | 173 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (other) | 304 | 276 | 172 |
120 | 93 | 85 | 82 |<br>
</span></div>
<div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+
</font>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font face="arial,sans-serif">Output size -
Clang (MB):</font></font></div>
<div><font face="monospace"><font face="arial,sans-serif"><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
</font></font></div>
<div><span style="font-family:monospace">|
Category | No GC | GC 1 | GC 2
| GC 3 | GC 4 | GC 5 | GC 6 |<br>
</span></div>
<div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (total) | 2596 | 2546 | 2406
| 2332 | 2293 | 2273 | 2251 |<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (DWARF*) | 1979 | 1979 | 1979
| 1979 | 1979 | 1979 | 1979 |<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (other) | 616 | 567 | 426
| 353 | 314 | 294 | 272 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (total) | 2397 | 2346 | 2164
| 2069 | 2017 | 1990 | 1963 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (DWARF*) | 1780 | 1780 | 1738
| 1716 | 1703 | 1696 | 1691 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (other) | 616 | 567 | 426
| 353 | 314 | 294 | 272 |<br>
</span></div>
<div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+</font></div>
</div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace">*DWARF size == total
size of .debug_info + .debug_line +
.debug_ranges + .debug_aranges + .debug_loc<br>
</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font face="arial,sans-serif">Additionally, I
have posted <a href="https://reviews.llvm.org/D89229" target="_blank">https://reviews.llvm.org/D89229</a>
which provides the python script and
linker patches used to reproduce the above
results on my machine. The GC 1/2/3/4/5/6
correspond to the linker option added in
that patch --mark-live-pc with values
1/0.8/0.6/0.4/0.2/0 respectively.<br>
</font></font></div>
<div><font face="monospace"><font face="arial,sans-serif"><br>
</font></font></div>
<div><font face="monospace"><font face="arial,sans-serif">During the
conference, the question was asked what
the memory usage and input size impact
was. I've summarised these below:</font></font></div>
<div><font face="monospace"><font face="arial,sans-serif"><br>
</font></font></div>
<div><font face="monospace"><font face="arial,sans-serif">Input file size
total (GB):</font></font></div>
<div><font face="monospace"><font face="arial,sans-serif"> <span style="font-family:monospace">+--------------------+------------+
</span></font></font></div>
<div><span style="font-family:monospace"> |
Package variant | Total Size | <br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+------------+<br>
</span></div>
<div><span style="font-family:monospace"> | Game
(plain) | 2.9 | <br>
</span></div>
<div><span style="font-family:monospace"> | Game
(fragmented) | 4.2 |<br>
</span></div>
<div><span style="font-family:monospace"> |
Clang (plain) | 10.9 |<br>
</span></div>
<div><span style="font-family:monospace"> |
Clang (fragmented) | 12.3 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+------------+</span></div>
<div><span style="font-family:monospace"><br>
</span></div>
<div><span style="font-family:monospace"><span style="font-family:arial,sans-serif">Peak
Working Set Memory usage (GB):</span><br>
</span></div>
<div><span style="font-family:monospace"> </span>
<div><font face="monospace"><font face="arial,sans-serif"><span style="font-family:monospace">+--------------------+-------+------+
</span></font></font></div>
<div><span style="font-family:monospace"> |
Package variant | No GC | GC 1 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+-------+------+<br>
</span></div>
<div><span style="font-family:monospace"> |
Game (plain) | 4.3 | 4.7 |<br>
</span></div>
<div><span style="font-family:monospace"> |
Game (fragmented) | 8.9 | 8.6 |<br>
</span></div>
<div><span style="font-family:monospace"> |
Clang (plain) | 15.7 | 15.6 |<br>
</span></div>
<div><span style="font-family:monospace"> |
Clang (fragmented) | 19.4 | 19.2 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+-------+------+</span></div>
<div><span style="font-family:monospace"><br>
</span></div>
<div><span style="font-family:monospace"><font face="arial,sans-serif">I'm keen to hear
what people's feedback is, and also
interested to see what results others
might see by running this experiment on
other input packages. Also, if anybody
has any alternative ideas that meet the
goals listed below, I'd love to hear
them!<br>
</font></span></div>
<div><span style="font-family:monospace"><font face="arial,sans-serif"><br>
</font></span></div>
<div><span style="font-family:monospace"><font face="arial,sans-serif">To reiterate
some key goals of fragmented DWARF,
similar to what I said in the
presentation:</font></span></div>
<div><span style="font-family:monospace"><font face="arial,sans-serif">1) Devise a
scheme that gives significant size
savings without being too costly. It's
clear from just the two packages I've
tried this on that there is a fairly
hefty link time performance cost,
although the exact cost depends on the
nature of the input package. On the
other hand, depending on the nature of
the input package, there can also be
some big gains.<br>
</font></span></div>
<div><span style="font-family:monospace"><font face="arial,sans-serif">2) Devise a
scheme that doesn't require any linker
knowledge of DWARF. The current approach
doesn't quite achieve this properly due
to the slight misuse of SHF_LINK_ORDER,
but I expect that a pivot to using
non-COMDAT group sections should solve
this problem.</font></span></div>
<div><span style="font-family:monospace"><font face="arial,sans-serif">3) Provide some
kind of halfway house between simply
writing tombstone values into dead DWARF
and fully parsing the DWARF to
reoptimise its/discard the dead bits.<br>
</font></span></div>
<div><span style="font-family:monospace"><font face="arial,sans-serif"><br>
</font></span></div>
<div><span style="font-family:monospace"><font face="arial,sans-serif">I'm hopeful that
changes could be made to the linker to
improve the link-time cost. There seems
to be a significant amount of the link
time spent creating the input sections.
An alternative would be to devise a
scheme that would avoid the literal
splitting into section headers, in
favour of some sort of list of
split-points that the linker uses to
split things up (a bit like it already
does for .eh_frame or mergeable
sections).</font><br>
</span></div>
<span style="font-family:monospace"> </span></div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div>