<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi James, <br>
<br>
Thank you very much for the information.<br>
According to the first problem: Could you send me a clang build
configuration that you used so that I could reproduce the problem,
please? <br>
</p>
<p>For the second problem: yes, I built the experiment with
-ffunction-sections -fdata-sections.<br>
According to the error message, it seems, that address ranges were
read incorrectly.<br>
As a quick guess, Could it be that incorrect address ranges are
marked with -1/-2 value? Then they might be handled incorrectly,
since this patch does not support(and was not tested) with
LowPC>HighPC case. The simplest solution would be not to use
-1/-2 values with this patch. <br>
</p>
<p>Thank you, Alexey.<br>
</p>
<div class="moz-cite-prefix">On 29.10.2020 13:52, James Henderson
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CABqSp3n5gfmQFQZ2Wq36CKX0_GE+fJxKkUmc-oRbEk3i=xqQnw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>Hi Alexey,</div>
<div><br>
</div>
<div>I've just started looking at running your patch on the
clang and game packages I used for the Fragmented DWARF
experiment, and on both occasions, I got "warning: Generated
debug info is broken" near the end of the link. Digging
further, the actual error this represented (for the clang
case) was "invalid e_shentsize in ELF header: 16912" (aside:
there are several Expected instances around where the former
warning was reported which are being thrown away and will
cause assertions under the right configuration). I don't
really follow the code enough to understand whether this is a
bug in the code or possibly some weird interaction with our
downstream patches (I don't expect the latter, for the clang
build, as our patches are supposed to be a no-op when not
using our target). I'll check what happens with the clang
package if I try using a completely vanilla LLVM with your
patch applied.</div>
<div><br>
</div>
<div>I also got a large number of "no mapping for range"
warnings when linking the game package. I tried debugging the
code in the area, but the data types are all difficult to
debug, and I don't really understand the relevant area of code
enough to be able to theorise what actually is causing this.
llvm-dwarfdump --verify doesn't flag up any issues, and
there's nothing obviously broken looking at the dump of the
debug data either. Any pointers as to what might be going
wrong would be appreciated. I assume with your experiments
that you build with -ffunction-sections/-fdata-sections for
maximum GC opportunities?</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>James<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 19 Oct 2020 at 09:50,
James Henderson <<a
href="mailto:jh7370.2008@my.bristol.ac.uk"
moz-do-not-send="true">jh7370.2008@my.bristol.ac.uk</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Great, thanks Alexey! I'll try to take a look
at this in the near future, and will report my results back
here. I imagine our clang results will differ, purely
because we probably used different toolchains to build the
input in the first place.<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, 15 Oct 2020 at
10:08, Alexey Lapshin <<a
href="mailto:avl.lapshin@gmail.com" target="_blank"
moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 13.10.2020 10:20, James Henderson wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>The script included in the patch can be used to
convert an object containing normal DWARF into an
object using fragmented DWARF. It does this by
using llvm-dwarfdump to dump the various sections,
parses the output to identify where it should
split (using the offsets of the various entries),
and then writes new section headers accordingly -
you can see roughly what it's doing if you get a
chance to watch the talk recording. The additional
section headers are appended to the end of the ELF
section header table, whilst the original DWARF is
left in the same place it was before (making use
of the fact that section headers don't have to
appear in offset order). The script also parses
and fragments the relocation sections targeting
the DWARF sections so that they match up with the
fragmented DWARF sections. This is clearly all
suboptimal - in practice the compiler should be
modified to do the fragmenting upfront, to save
having to parse a tool's stdout, but that was just
the simplest thing I could come up with to quickly
write the script. Full details of the script usage
are included in the patch description, if you want
to play around with it.</div>
<div><br>
</div>
<div>If Alexey could point me at the latest version
of his patch, I'd be happy to run that through
either or both of the packages I used to see what
happens. Equally, I'd be happy if Alexey is able
to run my script to fragment and measure the
performance of a couple of projects he's been
working with. Based purely on the two packages
I've tried this with, I can tell already that the
results can vary wildly. My expectation is that
Alexey's approach will be slower (at least in its
current form, but probably more generally), but
produce smaller output, but to what scale I have
no idea.<br>
</div>
</div>
</blockquote>
<p>James, I updated the patch - <a
href="https://reviews.llvm.org/D74169"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D74169</a>.</p>
<p>To make it working it is necessary to build example
with -ffunction-sections and specify following options
to the linker :</p>
<p>--gc-sections --gc-debuginfo --gc-debuginfo-no-odr</p>
<p>For clang binary I got following results:</p>
<p>1. --gc-sections = binary size 1,5G, Debug Info
size(*)1.2G</p>
<p>2. --gc-sections --gc-debuginfo = binary size 840M,
8x performance decrease, Debug Info size 542M<br>
</p>
<p>3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
= binary size 1,3G, 16x performance decrease, Debug
Info size 1G<br>
</p>
<p>(*)
.debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc<br>
</p>
<p><br>
</p>
<p>I added option --gc-debuginfo-no-odr, so that size
reduction could be compared correctly. Without that
option D74169 does types deduplication and then it is
not correct to compare resulting size with "Fragmented
DWARF" solution which does not do types deduplication.<br>
</p>
<p>Also, I look at your <font face="monospace"><font
face="arial,sans-serif"><a
href="https://reviews.llvm.org/D89229"
target="_blank" moz-do-not-send="true">D89229</a>
and would share results some time later.<br>
</font></font></p>
<p>Thank you, Alexey.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I think linkers parse .eh_frame partly because
they have no other choice. That being said, I
think it's format is not too complex, so similarly
the parser isn't too complex. You can see LLD's
ELF implementation in ELF/EhFrame.cpp, how it is
used in ELF/InputSection.cpp (see the bits to do
with EhInputSection) and EhFrameSection in
ELF/SyntheticSections.h (plus various usages of
these two throughout the LLD code). I think the
key to any structural changes in the DWARF format
to make them more amenable to link-time parsing is
being able to read a minimal amount without
needing to parse the payload (e.g. a length field,
some sort of type, and then using the relocations
to associate it accordingly).</div>
<div><br>
</div>
<div>James<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 12 Oct
2020 at 20:48, David Blaikie <<a
href="mailto:dblaikie@gmail.com" target="_blank"
moz-do-not-send="true">dblaikie@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Awesome! Sorry I missed the
lightning talk, but really interested to see
this sort of thing (though it's not
directly/immediately applicable to the use case
I work with - Split DWARF, something similar
could be used there with further work)<br>
<br>
Though it looks like the patch has mostly linker
changes - where/how do you generate the
fragmented DWARF to begin with? Via the Python
script? Run over assembly? I'd be surprised if
it was achievable that way - curious to know
more.<br>
<br>
Got a rough sense/are you able to run
apples-to-apples comparisons with Alexey's
linker-based patches to compare linker
time/memory overhead versus resulting output
size gains?<br>
<br>
(& yeah, I'm a bit curious about how the
linkers do eh_frame rewriting, if the format is
especially amenable to a lightweight
parsing/rewriting and how we could make the
DWARF more amenable to that too)</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Oct
12, 2020 at 6:41 AM James Henderson <<a
href="mailto:jh7370.2008@my.bristol.ac.uk"
target="_blank" moz-do-not-send="true">jh7370.2008@my.bristol.ac.uk</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hi all,</div>
<div><br>
</div>
<div>At the recent LLVM developers' meeting,
I presented a lightning talk on an
approach to reduce the amount of dead
debug data left in an executable following
operations such as --gc-sections and
duplicate COMDAT removal. In that
presentation, I presented some figures
based on linking a game that had been
built by our downstream clang port and
fragmented using the described approach.
Since recording the presentation, I ran
the same experiment on a clang package
(this time built with a GCC version). The
comparable figures are below:</div>
<div><br>
</div>
<div>Link-time speed (s):</div>
<div><span style="font-family:monospace">+--------------------+-------+---------------+------+------+------+------+------+</span><br>
</div>
<div><font face="monospace">| Package
variant | No GC | GC 1 (normal) | GC
2 | GC 3 | GC 4 | GC 5 | GC 6 |</font></div>
<div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+<br>
</font></div>
<div><font face="monospace">| Game
(plain) | 4.5 | 4.9 |
4.2 | 3.6 | 3.4 | 3.3 | 3.2 |<br>
</font></div>
<div><font face="monospace">| Game
(fragmented) | 11.1 | 11.8 |
9.7 | 8.6 | 7.9 | 7.7 | 7.5 |<br>
</font></div>
<div><font face="monospace">| Clang
(plain) | 13.9 | 17.9 |
17.0 | 16.7 | 16.3 | 16.2 | 16.1 |<br>
</font></div>
<div><font face="monospace">| Clang
(fragmented) | 18.6 | 22.8 |
21.6 | 21.1 | 20.8 | 20.5 | 20.2 |</font></div>
<div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Output size -
Game package (MB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
</font></font></div>
<div><span style="font-family:monospace">|
Category | No GC | GC 1 | GC
2 | GC 3 | GC 4 | GC 5 | GC 6 |<br>
</span></div>
<div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (total) | 1149 | 1121 |
1017 | 965 | 938 | 930 | 928 |<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (DWARF*) | 845 | 845 |
845 | 845 | 845 | 845 | 845 |<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (other) | 304 | 276 |
172 | 120 | 93 | 85 | 82 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (total) | 1044 | 940 |
556 | 373 | 287 | 263 | 255 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (DWARF*) | 740 | 664 |
384 | 253 | 194 | 178 | 173 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (other) | 304 | 276 |
172 | 120 | 93 | 85 | 82 |<br>
</span></div>
<div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+
</font>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Output size
- Clang (MB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
</font></font></div>
<div><span style="font-family:monospace">|
Category | No GC | GC 1 |
GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |<br>
</span></div>
<div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (total) | 2596 | 2546 |
2406 | 2332 | 2293 | 2273 | 2251 |<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (DWARF*) | 1979 | 1979 |
1979 | 1979 | 1979 | 1979 | 1979 |<br>
</span></div>
<div><span style="font-family:monospace">|
Plain (other) | 616 | 567 |
426 | 353 | 314 | 294 | 272 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (total) | 2397 | 2346 |
2164 | 2069 | 2017 | 1990 | 1963 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (DWARF*) | 1780 | 1780 |
1738 | 1716 | 1703 | 1696 | 1691 |<br>
</span></div>
<div><span style="font-family:monospace">|
Fragmented (other) | 616 | 567 |
426 | 353 | 314 | 294 | 272 |<br>
</span></div>
<div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+</font></div>
</div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace">*DWARF size ==
total size of .debug_info + .debug_line
+ .debug_ranges + .debug_aranges +
.debug_loc<br>
</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Additionally,
I have posted <a
href="https://reviews.llvm.org/D89229"
target="_blank"
moz-do-not-send="true">https://reviews.llvm.org/D89229</a>
which provides the python script and
linker patches used to reproduce the
above results on my machine. The GC
1/2/3/4/5/6 correspond to the linker
option added in that patch
--mark-live-pc with values
1/0.8/0.6/0.4/0.2/0 respectively.<br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif">During the
conference, the question was asked
what the memory usage and input size
impact was. I've summarised these
below:</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"><br>
</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif">Input file
size total (GB):</font></font></div>
<div><font face="monospace"><font
face="arial,sans-serif"> <span
style="font-family:monospace">+--------------------+------------+
</span></font></font></div>
<div><span style="font-family:monospace"> |
Package variant | Total Size | <br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+------------+<br>
</span></div>
<div><span style="font-family:monospace"> |
Game (plain) | 2.9 | <br>
</span></div>
<div><span style="font-family:monospace"> |
Game (fragmented) | 4.2 |<br>
</span></div>
<div><span style="font-family:monospace"> |
Clang (plain) | 10.9 |<br>
</span></div>
<div><span style="font-family:monospace"> |
Clang (fragmented) | 12.3 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+------------+</span></div>
<div><span style="font-family:monospace"><br>
</span></div>
<div><span style="font-family:monospace"><span
style="font-family:arial,sans-serif">Peak
Working Set Memory usage (GB):</span><br>
</span></div>
<div><span style="font-family:monospace"> </span>
<div><font face="monospace"><font
face="arial,sans-serif"><span
style="font-family:monospace">+--------------------+-------+------+
</span></font></font></div>
<div><span style="font-family:monospace">
| Package variant | No GC | GC 1 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+-------+------+<br>
</span></div>
<div><span style="font-family:monospace">
| Game (plain) | 4.3 | 4.7 |<br>
</span></div>
<div><span style="font-family:monospace">
| Game (fragmented) | 8.9 | 8.6 |<br>
</span></div>
<div><span style="font-family:monospace">
| Clang (plain) | 15.7 | 15.6 |<br>
</span></div>
<div><span style="font-family:monospace">
| Clang (fragmented) | 19.4 | 19.2 |<br>
</span></div>
<div><span style="font-family:monospace">
+--------------------+-------+------+</span></div>
<div><span style="font-family:monospace"><br>
</span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">I'm keen to
hear what people's feedback is, and
also interested to see what results
others might see by running this
experiment on other input packages.
Also, if anybody has any alternative
ideas that meet the goals listed
below, I'd love to hear them!<br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif"><br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">To reiterate
some key goals of fragmented DWARF,
similar to what I said in the
presentation:</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">1) Devise a
scheme that gives significant size
savings without being too costly.
It's clear from just the two
packages I've tried this on that
there is a fairly hefty link time
performance cost, although the exact
cost depends on the nature of the
input package. On the other hand,
depending on the nature of the input
package, there can also be some big
gains.<br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">2) Devise a
scheme that doesn't require any
linker knowledge of DWARF. The
current approach doesn't quite
achieve this properly due to the
slight misuse of SHF_LINK_ORDER, but
I expect that a pivot to using
non-COMDAT group sections should
solve this problem.</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">3) Provide
some kind of halfway house between
simply writing tombstone values into
dead DWARF and fully parsing the
DWARF to reoptimise its/discard the
dead bits.<br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif"><br>
</font></span></div>
<div><span style="font-family:monospace"><font
face="arial,sans-serif">I'm hopeful
that changes could be made to the
linker to improve the link-time
cost. There seems to be a
significant amount of the link time
spent creating the input sections.
An alternative would be to devise a
scheme that would avoid the literal
splitting into section headers, in
favour of some sort of list of
split-points that the linker uses to
split things up (a bit like it
already does for .eh_frame or
mergeable sections).</font><br>
</span></div>
<span style="font-family:monospace"> </span></div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>