<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 04.11.2020 16:57, James Henderson
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CABqSp3=ekEcy5sm+cFrytyijuYoGME=obDRcNkLQEsppi2hthg@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>Great, thanks! Those results are about roughly what I was
expecting. I assume "compilation time" is actually just the
link time?</div>
</div>
</blockquote>
<p>yep, that is link time.<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CABqSp3=ekEcy5sm+cFrytyijuYoGME=obDRcNkLQEsppi2hthg@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
<div>I find it particularly interesting that the DWARFLinker
rewriting solution produces the same size improvement in
.debug_line as the fragmented DWARF approach. That suggests
that in that case, fragmented DWARF output is probably about
as optimal as it can get. I'm not surprised that the same
can't be said for other sections, but I'm also pleased to see
that the full rewrite option isn't so much better in size
improvements.</div>
<div><br>
</div>
<div>Regarding the problems I was having with the patch, if you
want to try reproducing the problems with clang, I built
commit 05d02e5a of clang using gcc 7.5.0 on Ubuntu 18.04, to
generate an ELF package. I then used LLD to relink it to
create a reproducible package. As I'm primarily a Windows
developer, I transferred this package to my Windows machine so
that I could use my existing Windows checkout of LLVM, applied
your patch, rebuilt LLD, and used that to try linking the
package, getting the stated message. I'm going to have another
try at the latter now to see if I can figure out what the
issue is myself.</div>
<div><br>
</div>
<div>James<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, 4 Nov 2020 at 13:35,
Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com"
moz-do-not-send="true">avl.lapshin@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 04.11.2020 15:28, James Henderson wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Alexey,</div>
<div><br>
</div>
<div>Thanks for taking a look at these. I noticed you
set the --mark-live-pc value to a value other than 1
for the fragmented DWARF version. This will mean
additional GC-ing will be done beyond the amount that
--gc-sections will do, so unless you use the same
value for the option for other versions, the result
will not be comparable. (The option is purely there to
experiment with the effects were different amounts of
the input codebase to be considered dead). Would you
be okay to run those figures again without the option
specified?</div>
</div>
</blockquote>
<p>Oh, mis-interpreted that option. Following are updated
results:<br>
</p>
<tt>1. llvm-strings:</tt><tt><br>
</tt><tt><br>
</tt><tt> source object files size: 381M.</tt><tt><br>
</tt><tt> fragmented source object files size: 451M(18%
increase).</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> a. upstream version, </tt><tt><br>
</tt><tt> command line options: --gc-sections</tt><tt><br>
</tt><tt> binary size: 6,5M</tt><tt><br>
</tt><tt> compilation time: 0:00.13 sec</tt><tt><br>
</tt><tt> run-time memory: 111kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> b. "fragmented DWARF" version, </tt><tt><br>
</tt><tt> command line options: --gc-sections</tt><tt><br>
</tt><tt> binary size: 5,3M</tt><tt><br>
</tt><tt> compilation time: 0:00.11 sec</tt><tt><br>
</tt><tt> run-time memory: 125kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> c. DWARFLinker version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--gc-debuginfo</tt><tt><br>
</tt><tt> binary size: 3,8M</tt><tt><br>
</tt><tt> compilation time: 0:00.33 sec</tt><tt><br>
</tt><tt> run-time memory: 141kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> d. DWARFLinker no-odr version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--gc-debuginfo --gc-debuginfo-no-odr</tt><tt><br>
</tt><tt> binary size: 4,3M</tt><tt><br>
</tt><tt> compilation time: 0:00.38 sec</tt><tt><br>
</tt><tt> run-time memory: 142kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt><br>
</tt><tt>2. clang:</tt><tt><br>
</tt><tt><br>
</tt><tt> source object files size: 6,5G.</tt><tt><br>
</tt><tt> fragmented source object files size: 7,3G(13%
increase).</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> a. upstream version, </tt><tt><br>
</tt><tt> command line options: --gc-sections</tt><tt><br>
</tt><tt> binary size: 1,5G</tt><tt><br>
</tt><tt> compilation time: 6 sec </tt><tt><br>
</tt><tt> run-time memory: 9.7G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> b. "fragmented DWARF" version, </tt><tt><br>
</tt><tt> command line options: --gc-sections </tt><tt><br>
</tt><tt> binary size: 1,4G</tt><tt><br>
</tt><tt> compilation time: 8 sec</tt><tt><br>
</tt><tt> run-time memory: 12G </tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> c. DWARFLinker version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--gc-debuginfo</tt><tt><br>
</tt><tt> binary size: 836M</tt><tt><br>
</tt><tt> compilation time: 62 sec</tt><tt><br>
</tt><tt> run-time memory: 15G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> d. DWARFLinker no-odr version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--gc-debuginfo --gc-debuginfo-no-odr</tt><tt><br>
</tt><tt> binary size: 1,3G</tt><tt><br>
</tt><tt> compilation time: 128 sec</tt><tt><br>
</tt><tt> run-time memory: 17G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt>Detailed size results:</tt><tt><br>
</tt><tt><br>
</tt><tt>1. a)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 41.1% 2.64Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 24.9% 1.60Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 12.6% 827Ki 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 6.5% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 4.8% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 3.4% 223Ki 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 2.0% 133Ki 19.8% 133Ki .eh_frame</tt><tt><br>
</tt><tt> 1.7% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.2% 77.6Ki 0.0% 0 .debug_abbrev</tt><tt><br>
</tt><tt><br>
</tt><tt> b)</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 40.2% 2.10Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 30.7% 1.60Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 8.0% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 5.9% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 5.9% 313Ki 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 2.5% 133Ki 19.8% 133Ki .eh_frame</tt><tt><br>
</tt><tt> 2.1% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.5% 77.6Ki 0.0% 0 .debug_abbrev</tt><tt><br>
</tt><tt> 1.3% 69.2Ki 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt><br>
</tt><tt> c)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 33.0% 1.25Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 29.2% 1.11Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 11.0% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 8.2% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 7.8% 304Ki 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 3.4% 133Ki 19.8% 133Ki .eh_frame</tt><tt><br>
</tt><tt> 2.8% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.7% 65.9Ki 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 1.0% 38.4Ki 5.7% 38.4Ki .rodata</tt><tt><br>
</tt><tt><br>
</tt><tt> d)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 39.7% 1.68Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 26.3% 1.11Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 9.9% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 7.3% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 7.0% 304Ki 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 3.1% 133Ki 19.8% 133Ki .eh_frame</tt><tt><br>
</tt><tt> 2.6% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.5% 65.9Ki 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>2. a)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 58.3% 878Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 11.8% 177Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 7.7% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 7.7% 115Mi 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 6.0% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 2.4% 35.4Mi 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame</tt><tt><br>
</tt><tt> 1.5% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 1.2% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt><br>
</tt><tt> b)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 59.6% 807Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 13.1% 177Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 8.5% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 6.7% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 4.2% 57.4Mi 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 1.7% 23.3Mi 12.5% 23.3Mi .eh_frame</tt><tt><br>
</tt><tt> 1.7% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 1.3% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.0% 13.0Mi 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr</tt><tt><br>
</tt><tt><br>
</tt><tt> c)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 35.1% 293Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 21.2% 177Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 13.9% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 10.9% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 6.9% 57.4Mi 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame</tt><tt><br>
</tt><tt> 2.8% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 2.1% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.5% 12.4Mi 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr</tt><tt><br>
</tt><tt><br>
</tt><tt> d)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 58.3% 758Mi 0.0% 0 .debug_info</tt><tt><br>
</tt><tt> 13.6% 177Mi 0.0% 0 .debug_str</tt><tt><br>
</tt><tt> 8.9% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 7.0% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 4.4% 57.4Mi 0.0% 0 .debug_line</tt><tt><br>
</tt><tt> 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame</tt><tt><br>
</tt><tt> 1.8% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 1.4% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.0% 12.4Mi 0.0% 0 .debug_ranges</tt><tt><br>
</tt><tt> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr</tt>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I'm still trying to figure out the problems on my
end to try running your experiment on the game package
I used in my presentation, but have been interrupted
by other unrelated issues. I'll try to get back to
this in the coming days.</div>
<div><br>
</div>
<div>James<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, 4 Nov 2020 at
11:54, Alexey Lapshin <<a
href="mailto:avl.lapshin@gmail.com" target="_blank"
moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi James,<br>
<br>
I did experiments with the clang code base and
will do experiments with our local codebase later.
<br>
Overall, both solutions("Fragmented DWARF" and
"DWARFLinker without odr types deduplication")
look having similar size savings results for the
final binary. "DWARFLinker with odr types
deduplication" has a bigger size saving effect.
"Fragmented DWARF" increases the size of original
object files up to 15%.<br>
LLD with "fragmented DWARF" works significantly
faster than with "DWARFLinker".<br>
<br>
Following are the results for "llvm-strings" and
"clang" binaries:<br>
<br>
1. llvm-strings:<br>
<br>
<tt> source object files size: 381M.</tt><tt><br>
</tt><tt> fragmented source object files size:
451M(18% increase).</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> a. upstream version, </tt><tt><br>
</tt><tt> command line options: --gc-sections</tt><tt><br>
</tt><tt> binary size: 6,5M</tt><tt><br>
</tt><tt> compilation time: 0:00.13 sec</tt><tt><br>
</tt><tt> run-time memory: 111kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> b. "fragmented DWARF" version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--mark-live-pc=0.45</tt><tt><br>
</tt><tt> binary size: 3,7M</tt><tt><br>
</tt><tt> compilation time: 0:00.10 sec</tt><tt><br>
</tt><tt> run-time memory: 122kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> c. DWARFLinker version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--gc-debuginfo</tt><tt><br>
</tt><tt> binary size: 3,8M</tt><tt><br>
</tt><tt> compilation time: 0:00.33 sec</tt><tt><br>
</tt><tt> run-time memory: 141kb</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> d. DWARFLinker no-odr version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--gc-debuginfo --gc-debuginfo-no-odr</tt><tt><br>
</tt><tt> binary size: 4,3M</tt><tt><br>
</tt><tt> compilation time: 0:00.38 sec</tt><tt><br>
</tt><tt> run-time memory: 142kb</tt><br>
<br>
<br>
2. clang:<br>
<br>
<tt> source object files size: 6,5G.</tt><tt><br>
</tt><tt> fragmented source object files size:
7,3G(13% increase).</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> a. upstream version, </tt><tt><br>
</tt><tt> command line options: --gc-sections</tt><tt><br>
</tt><tt> binary size: 1,5G</tt><tt><br>
</tt><tt> compilation time: 6 sec </tt><tt><br>
</tt><tt> run-time memory: 9.7G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> b. "fragmented DWARF" version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--mark-live-pc=0.43</tt><tt><br>
</tt><tt> binary size: 1,1G</tt><tt><br>
</tt><tt> compilation time: 9 sec</tt><tt><br>
</tt><tt> run-time memory: 11G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> c. DWARFLinker version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--gc-debuginfo</tt><tt><br>
</tt><tt> binary size: 836M</tt><tt><br>
</tt><tt> compilation time: 62 sec</tt><tt><br>
</tt><tt> run-time memory: 15G</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> d. DWARFLinker no-odr version, </tt><tt><br>
</tt><tt> command line options: --gc-sections
--gc-debuginfo --gc-debuginfo-no-odr</tt><tt><br>
</tt><tt> binary size: 1,3G</tt><tt><br>
</tt><tt> compilation time: 128 sec</tt><tt><br>
</tt><tt> run-time memory: 17G</tt><br>
<br>
Detailed size results:<br>
<br>
<tt>1. llvm-strings <br>
</tt></p>
<p><tt> a)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 41.1% 2.64Mi 0.0% 0
.debug_info</tt><tt><br>
</tt><tt> 24.9% 1.60Mi 0.0% 0
.debug_str</tt><tt><br>
</tt><tt> 12.6% 827Ki 0.0% 0
.debug_line</tt><tt><br>
</tt><tt> 6.5% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 4.8% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 3.4% 223Ki 0.0% 0
.debug_ranges</tt><tt><br>
</tt><tt> 2.0% 133Ki 19.8% 133Ki
.eh_frame</tt><tt><br>
</tt><tt> 1.7% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.2% 77.6Ki 0.0% 0
.debug_abbrev</tt><tt><br>
</tt><tt><br>
</tt><tt> b)</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 50.3% 1.85Mi 0.0% 0
.debug_info</tt><tt><br>
</tt><tt> 43.6% 1.60Mi 0.0% 0
.debug_str</tt><tt><br>
</tt><tt> 2.6% 98.2Ki 0.0% 0
.debug_line</tt><tt><br>
</tt><tt> 2.1% 77.6Ki 0.0% 0
.debug_abbrev</tt><tt><br>
</tt><tt> 0.5% 17.5Ki 54.9% 17.4Ki .text</tt><tt><br>
</tt><tt> 0.3% 9.94Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 0.2% 6.27Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 0.1% 5.09Ki 15.9% 5.03Ki
.eh_frame</tt><tt><br>
</tt><tt> 0.1% 3.28Ki 0.0% 0
.debug_ranges</tt><tt><br>
</tt><tt><br>
</tt><tt> c)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 33.0% 1.25Mi 0.0% 0
.debug_info</tt><tt><br>
</tt><tt> 29.2% 1.11Mi 0.0% 0
.debug_str</tt><tt><br>
</tt><tt> 11.0% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 8.2% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 7.8% 304Ki 0.0% 0
.debug_line</tt><tt><br>
</tt><tt> 3.4% 133Ki 19.8% 133Ki
.eh_frame</tt><tt><br>
</tt><tt> 2.8% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.7% 65.9Ki 0.0% 0
.debug_ranges</tt><tt><br>
</tt><tt> 1.0% 38.4Ki 5.7% 38.4Ki .rodata</tt><tt><br>
</tt><tt><br>
</tt><tt> d)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 39.7% 1.68Mi 0.0% 0
.debug_info</tt><tt><br>
</tt><tt> 26.3% 1.11Mi 0.0% 0
.debug_str</tt><tt><br>
</tt><tt> 9.9% 428Ki 63.8% 428Ki .text</tt><tt><br>
</tt><tt> 7.3% 317Ki 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 7.0% 304Ki 0.0% 0
.debug_line</tt><tt><br>
</tt><tt> 3.1% 133Ki 19.8% 133Ki
.eh_frame</tt><tt><br>
</tt><tt> 2.6% 110Ki 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.5% 65.9Ki 0.0% 0
.debug_ranges</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>2. clang</tt></p>
<p><tt> a)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 58.3% 878Mi 0.0% 0
.debug_info</tt><tt><br>
</tt><tt> 11.8% 177Mi 0.0% 0
.debug_str</tt><tt><br>
</tt><tt> 7.7% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 7.7% 115Mi 0.0% 0
.debug_line</tt><tt><br>
</tt><tt> 6.0% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 2.4% 35.4Mi 0.0% 0
.debug_ranges</tt><tt><br>
</tt><tt> 1.5% 23.3Mi 12.5% 23.3Mi
.eh_frame</tt><tt><br>
</tt><tt> 1.5% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 1.2% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt><br>
</tt><tt> b)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 71.5% 772Mi 0.0% 0
.debug_info</tt><tt><br>
</tt><tt> 16.5% 177Mi 0.0% 0
.debug_str</tt><tt><br>
</tt><tt> 3.7% 40.2Mi 59.2% 40.2Mi .text</tt><tt><br>
</tt><tt> 2.4% 25.8Mi 0.0% 0
.debug_line</tt><tt><br>
</tt><tt> 2.1% 23.0Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 1.0% 10.6Mi 15.6% 10.6Mi .dynstr</tt><tt><br>
</tt><tt> 0.7% 7.18Mi 10.6% 7.18Mi
.eh_frame</tt><tt><br>
</tt><tt> 0.5% 5.60Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 0.4% 4.28Mi 0.0% 0
.debug_ranges</tt><tt><br>
</tt><tt> 0.4% 4.04Mi 0.0% 0
.debug_abbrev</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt> c)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 35.1% 293Mi 0.0% 0
.debug_info</tt><tt><br>
</tt><tt> 21.2% 177Mi 0.0% 0
.debug_str</tt><tt><br>
</tt><tt> 13.9% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 10.9% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 6.9% 57.4Mi 0.0% 0
.debug_line</tt><tt><br>
</tt><tt> 2.8% 23.3Mi 12.5% 23.3Mi
.eh_frame</tt><tt><br>
</tt><tt> 2.8% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 2.1% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.5% 12.4Mi 0.0% 0
.debug_ranges</tt><tt><br>
</tt><tt> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr</tt><tt><br>
</tt><tt><br>
</tt><tt> d)</tt><tt><br>
</tt><tt><br>
</tt><tt> FILE SIZE VM SIZE </tt><tt><br>
</tt><tt> -------------- -------------- </tt><tt><br>
</tt><tt> 58.3% 758Mi 0.0% 0
.debug_info</tt><tt><br>
</tt><tt> 13.6% 177Mi 0.0% 0
.debug_str</tt><tt><br>
</tt><tt> 8.9% 115Mi 62.2% 115Mi .text</tt><tt><br>
</tt><tt> 7.0% 90.7Mi 0.0% 0 .strtab</tt><tt><br>
</tt><tt> 4.4% 57.4Mi 0.0% 0
.debug_line</tt><tt><br>
</tt><tt> 1.8% 23.3Mi 12.5% 23.3Mi
.eh_frame</tt><tt><br>
</tt><tt> 1.8% 23.0Mi 12.4% 23.0Mi .rodata</tt><tt><br>
</tt><tt> 1.4% 17.9Mi 0.0% 0 .symtab</tt><tt><br>
</tt><tt> 1.0% 12.4Mi 0.0% 0
.debug_ranges</tt><tt><br>
</tt><tt> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr</tt></p>
<p><tt>Thank you, Alexey.</tt><br>
</p>
<div>On 19.10.2020 11:50, James Henderson wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Great, thanks Alexey! I'll try to
take a look at this in the near future, and will
report my results back here. I imagine our clang
results will differ, purely because we probably
used different toolchains to build the input in
the first place.<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, 15 Oct
2020 at 10:08, Alexey Lapshin <<a
href="mailto:avl.lapshin@gmail.com"
target="_blank" moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 13.10.2020 10:20, James Henderson
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>The script included in the patch
can be used to convert an object
containing normal DWARF into an object
using fragmented DWARF. It does this
by using llvm-dwarfdump to dump the
various sections, parses the output to
identify where it should split (using
the offsets of the various entries),
and then writes new section headers
accordingly - you can see roughly what
it's doing if you get a chance to
watch the talk recording. The
additional section headers are
appended to the end of the ELF section
header table, whilst the original
DWARF is left in the same place it was
before (making use of the fact that
section headers don't have to appear
in offset order). The script also
parses and fragments the relocation
sections targeting the DWARF sections
so that they match up with the
fragmented DWARF sections. This is
clearly all suboptimal - in practice
the compiler should be modified to do
the fragmenting upfront, to save
having to parse a tool's stdout, but
that was just the simplest thing I
could come up with to quickly write
the script. Full details of the script
usage are included in the patch
description, if you want to play
around with it.</div>
<div><br>
</div>
<div>If Alexey could point me at the
latest version of his patch, I'd be
happy to run that through either or
both of the packages I used to see
what happens. Equally, I'd be happy if
Alexey is able to run my script to
fragment and measure the performance
of a couple of projects he's been
working with. Based purely on the two
packages I've tried this with, I can
tell already that the results can vary
wildly. My expectation is that
Alexey's approach will be slower (at
least in its current form, but
probably more generally), but produce
smaller output, but to what scale I
have no idea.<br>
</div>
</div>
</blockquote>
<p>James, I updated the patch - <a
href="https://reviews.llvm.org/D74169"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D74169</a>.</p>
<p>To make it working it is necessary to
build example with -ffunction-sections and
specify following options to the linker :</p>
<p>--gc-sections --gc-debuginfo
--gc-debuginfo-no-odr</p>
<p>For clang binary I got following results:</p>
<p>1. --gc-sections = binary size 1,5G,
Debug Info size(*)1.2G</p>
<p>2. --gc-sections --gc-debuginfo = binary
size 840M, 8x performance decrease, Debug
Info size 542M<br>
</p>
<p>3. --gc-sections --gc-debuginfo
--gc-debuginfo-no-odr = binary size 1,3G,
16x performance decrease, Debug Info size
1G<br>
</p>
<p>(*)
.debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc<br>
</p>
<p><br>
</p>
<p>I added option --gc-debuginfo-no-odr, so
that size reduction could be compared
correctly. Without that option D74169 does
types deduplication and then it is not
correct to compare resulting size with
"Fragmented DWARF" solution which does not
do types deduplication.<br>
</p>
<p>Also, I look at your <font
face="monospace"><font
face="arial,sans-serif"><a
href="https://reviews.llvm.org/D89229"
target="_blank"
moz-do-not-send="true">D89229</a>
and would share results some time
later.<br>
</font></font></p>
<p>Thank you, Alexey.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I think linkers parse .eh_frame
partly because they have no other
choice. That being said, I think it's
format is not too complex, so
similarly the parser isn't too
complex. You can see LLD's ELF
implementation in ELF/EhFrame.cpp, how
it is used in ELF/InputSection.cpp
(see the bits to do with
EhInputSection) and EhFrameSection in
ELF/SyntheticSections.h (plus various
usages of these two throughout the LLD
code). I think the key to any
structural changes in the DWARF format
to make them more amenable to
link-time parsing is being able to
read a minimal amount without needing
to parse the payload (e.g. a length
field, some sort of type, and then
using the relocations to associate it
accordingly).</div>
<div><br>
</div>
<div>James<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Mon, 12 Oct 2020 at 20:48, David
Blaikie <<a
href="mailto:dblaikie@gmail.com"
target="_blank"
moz-do-not-send="true">dblaikie@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Awesome! Sorry I missed
the lightning talk, but really
interested to see this sort of thing
(though it's not
directly/immediately applicable to
the use case I work with - Split
DWARF, something similar could be
used there with further work)<br>
<br>
Though it looks like the patch has
mostly linker changes - where/how do
you generate the fragmented DWARF to
begin with? Via the Python script?
Run over assembly? I'd be surprised
if it was achievable that way -
curious to know more.<br>
<br>
Got a rough sense/are you able to
run apples-to-apples comparisons
with Alexey's linker-based patches
to compare linker time/memory
overhead versus resulting output
size gains?<br>
<br>
(& yeah, I'm a bit curious about
how the linkers do eh_frame
rewriting, if the format is
especially amenable to a lightweight
parsing/rewriting and how we could
make the DWARF more amenable to that
too)</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Mon, Oct 12, 2020 at 6:41 AM James
Henderson <<a
href="mailto:jh7370.2008@my.bristol.ac.uk"
target="_blank"
moz-do-not-send="true">jh7370.2008@my.bristol.ac.uk</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hi all,</div>
<div><br>
</div>
<div>At the recent LLVM
developers' meeting, I
presented a lightning talk on
an approach to reduce the
amount of dead debug data left
in an executable following
operations such as
--gc-sections and duplicate
COMDAT removal. In that
presentation, I presented some
figures based on linking a
game that had been built by
our downstream clang port and
fragmented using the described
approach. Since recording the
presentation, I ran the same
experiment on a clang package
(this time built with a GCC
version). The comparable
figures are below:</div>
<div><br>
</div>
<div>Link-time speed (s):</div>
<div><span
style="font-family:monospace">+--------------------+-------+---------------+------+------+------+------+------+</span><br>
</div>
<div><font face="monospace">|
Package variant | No GC |
GC 1 (normal) | GC 2 | GC 3
| GC 4 | GC 5 | GC 6 |</font></div>
<div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+<br>
</font></div>
<div><font face="monospace">|
Game (plain) | 4.5
| 4.9 | 4.2 |
3.6 | 3.4 | 3.3 | 3.2 |<br>
</font></div>
<div><font face="monospace">|
Game (fragmented) | 11.1 |
11.8 | 9.7 | 8.6
| 7.9 | 7.7 | 7.5 |<br>
</font></div>
<div><font face="monospace">|
Clang (plain) | 13.9 |
17.9 | 17.0 | 16.7
| 16.3 | 16.2 | 16.1 |<br>
</font></div>
<div><font face="monospace">|
Clang (fragmented) | 18.6 |
22.8 | 21.6 | 21.1
| 20.8 | 20.5 | 20.2 |</font></div>
<div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Output
size - Game package (MB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
</font></font></div>
<div><span
style="font-family:monospace">|
Category | No GC
| GC 1 | GC 2 | GC 3 | GC 4
| GC 5 | GC 6 |<br>
</span></div>
<div><span
style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
</span></div>
<div><span
style="font-family:monospace">|
Plain (total) | 1149
| 1121 | 1017 | 965 | 938
| 930 | 928 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Plain (DWARF*) | 845
| 845 | 845 | 845 | 845
| 845 | 845 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Plain (other) | 304
| 276 | 172 | 120 | 93
| 85 | 82 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Fragmented (total) | 1044
| 940 | 556 | 373 | 287
| 263 | 255 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Fragmented (DWARF*) | 740
| 664 | 384 | 253 | 194
| 178 | 173 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Fragmented (other) | 304
| 276 | 172 | 120 | 93
| 85 | 82 |<br>
</span></div>
<div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+
</font>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Output
size - Clang (MB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
</font></font></div>
<div><span
style="font-family:monospace">|
Category | No
GC | GC 1 | GC 2 | GC 3 |
GC 4 | GC 5 | GC 6 |<br>
</span></div>
<div><span
style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
</span></div>
<div><span
style="font-family:monospace">|
Plain (total) |
2596 | 2546 | 2406 | 2332
| 2293 | 2273 | 2251 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Plain (DWARF*) |
1979 | 1979 | 1979 | 1979
| 1979 | 1979 | 1979 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Plain (other) |
616 | 567 | 426 | 353
| 314 | 294 | 272 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Fragmented (total) |
2397 | 2346 | 2164 | 2069
| 2017 | 1990 | 1963 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Fragmented (DWARF*) |
1780 | 1780 | 1738 | 1716
| 1703 | 1696 | 1691 |<br>
</span></div>
<div><span
style="font-family:monospace">|
Fragmented (other) |
616 | 567 | 426 | 353
| 314 | 294 | 272 |<br>
</span></div>
<div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+</font></div>
</div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace">*DWARF
size == total size of
.debug_info + .debug_line +
.debug_ranges +
.debug_aranges + .debug_loc<br>
</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Additionally,
I have posted <a
href="https://reviews.llvm.org/D89229"
target="_blank"
moz-do-not-send="true">https://reviews.llvm.org/D89229</a>
which provides the python
script and linker patches
used to reproduce the
above results on my
machine. The GC
1/2/3/4/5/6 correspond to
the linker option added in
that patch --mark-live-pc
with values
1/0.8/0.6/0.4/0.2/0
respectively.<br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif">During
the conference, the
question was asked what
the memory usage and input
size impact was. I've
summarised these below:</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Input
file size total (GB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"> <span
style="font-family:monospace">+--------------------+------------+ </span></font></font></div>
<div><span
style="font-family:monospace">
| Package variant | Total
Size | <br>
</span></div>
<div><span
style="font-family:monospace">
+--------------------+------------+<br>
</span></div>
<div><span
style="font-family:monospace">
| Game (plain) |
2.9 | <br>
</span></div>
<div><span
style="font-family:monospace">
| Game (fragmented) |
4.2 |<br>
</span></div>
<div><span
style="font-family:monospace">
| Clang (plain) |
10.9 |<br>
</span></div>
<div><span
style="font-family:monospace">
| Clang (fragmented) |
12.3 |<br>
</span></div>
<div><span
style="font-family:monospace">
+--------------------+------------+</span></div>
<div><span
style="font-family:monospace"><br>
</span></div>
<div><span
style="font-family:monospace"><span
style="font-family:arial,sans-serif">Peak Working Set Memory usage (GB):</span><br>
</span></div>
<div><span
style="font-family:monospace">
</span>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+--------------------+-------+------+ </span></font></font></div>
<div><span
style="font-family:monospace">
| Package variant | No
GC | GC 1 |<br>
</span></div>
<div><span
style="font-family:monospace">
+--------------------+-------+------+<br>
</span></div>
<div><span
style="font-family:monospace">
| Game (plain) |
4.3 | 4.7 |<br>
</span></div>
<div><span
style="font-family:monospace">
| Game (fragmented) |
8.9 | 8.6 |<br>
</span></div>
<div><span
style="font-family:monospace">
| Clang (plain) |
15.7 | 15.6 |<br>
</span></div>
<div><span
style="font-family:monospace">
| Clang (fragmented) |
19.4 | 19.2 |<br>
</span></div>
<div><span
style="font-family:monospace">
+--------------------+-------+------+</span></div>
<div><span
style="font-family:monospace"><br>
</span></div>
<div><span
style="font-family:monospace"><font
face="arial,sans-serif">I'm
keen to hear what
people's feedback is,
and also interested to
see what results others
might see by running
this experiment on other
input packages. Also, if
anybody has any
alternative ideas that
meet the goals listed
below, I'd love to hear
them!<br>
</font></span></div>
<div><span
style="font-family:monospace"><font
face="arial,sans-serif"><br>
</font></span></div>
<div><span
style="font-family:monospace"><font
face="arial,sans-serif">To
reiterate some key goals
of fragmented DWARF,
similar to what I said
in the presentation:</font></span></div>
<div><span
style="font-family:monospace"><font
face="arial,sans-serif">1)
Devise a scheme that
gives significant size
savings without being
too costly. It's clear
from just the two
packages I've tried this
on that there is a
fairly hefty link time
performance cost,
although the exact cost
depends on the nature of
the input package. On
the other hand,
depending on the nature
of the input package,
there can also be some
big gains.<br>
</font></span></div>
<div><span
style="font-family:monospace"><font
face="arial,sans-serif">2)
Devise a scheme that
doesn't require any
linker knowledge of
DWARF. The current
approach doesn't quite
achieve this properly
due to the slight misuse
of SHF_LINK_ORDER, but I
expect that a pivot to
using non-COMDAT group
sections should solve
this problem.</font></span></div>
<div><span
style="font-family:monospace"><font
face="arial,sans-serif">3)
Provide some kind of
halfway house between
simply writing tombstone
values into dead DWARF
and fully parsing the
DWARF to reoptimise
its/discard the dead
bits.<br>
</font></span></div>
<div><span
style="font-family:monospace"><font
face="arial,sans-serif"><br>
</font></span></div>
<div><span
style="font-family:monospace"><font
face="arial,sans-serif">I'm
hopeful that changes
could be made to the
linker to improve the
link-time cost. There
seems to be a
significant amount of
the link time spent
creating the input
sections. An alternative
would be to devise a
scheme that would avoid
the literal splitting
into section headers, in
favour of some sort of
list of split-points
that the linker uses to
split things up (a bit
like it already does for
.eh_frame or mergeable
sections).</font><br>
</span></div>
<span
style="font-family:monospace">
</span></div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>