[llvm-dev] Fragmented DWARF
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Thu Nov 5 11:58:29 PST 2020
On 04.11.2020 16:57, James Henderson wrote:
> Great, thanks! Those results are about roughly what I was expecting. I
> assume "compilation time" is actually just the link time?
yep, that is link time.
>
> I find it particularly interesting that the DWARFLinker rewriting
> solution produces the same size improvement in .debug_line as the
> fragmented DWARF approach. That suggests that in that case, fragmented
> DWARF output is probably about as optimal as it can get. I'm not
> surprised that the same can't be said for other sections, but I'm also
> pleased to see that the full rewrite option isn't so much better in
> size improvements.
>
> Regarding the problems I was having with the patch, if you want to try
> reproducing the problems with clang, I built commit 05d02e5a of clang
> using gcc 7.5.0 on Ubuntu 18.04, to generate an ELF package. I then
> used LLD to relink it to create a reproducible package. As I'm
> primarily a Windows developer, I transferred this package to my
> Windows machine so that I could use my existing Windows checkout of
> LLVM, applied your patch, rebuilt LLD, and used that to try linking
> the package, getting the stated message. I'm going to have another try
> at the latter now to see if I can figure out what the issue is myself.
>
> James
>
> On Wed, 4 Nov 2020 at 13:35, Alexey Lapshin <avl.lapshin at gmail.com
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
> On 04.11.2020 15:28, James Henderson wrote:
>> Hi Alexey,
>>
>> Thanks for taking a look at these. I noticed you set the
>> --mark-live-pc value to a value other than 1 for the fragmented
>> DWARF version. This will mean additional GC-ing will be done
>> beyond the amount that --gc-sections will do, so unless you use
>> the same value for the option for other versions, the result will
>> not be comparable. (The option is purely there to experiment with
>> the effects were different amounts of the input codebase to be
>> considered dead). Would you be okay to run those figures again
>> without the option specified?
>
> Oh, mis-interpreted that option. Following are updated results:
>
> 1. llvm-strings:
>
> source object files size: 381M.
> fragmented source object files size: 451M(18% increase).
>
> a. upstream version,
> command line options: --gc-sections
> binary size: 6,5M
> compilation time: 0:00.13 sec
> run-time memory: 111kb
>
> b. "fragmented DWARF" version,
> command line options: --gc-sections
> binary size: 5,3M
> compilation time: 0:00.11 sec
> run-time memory: 125kb
>
> c. DWARFLinker version,
> command line options: --gc-sections --gc-debuginfo
> binary size: 3,8M
> compilation time: 0:00.33 sec
> run-time memory: 141kb
>
> d. DWARFLinker no-odr version,
> command line options: --gc-sections --gc-debuginfo
> --gc-debuginfo-no-odr
> binary size: 4,3M
> compilation time: 0:00.38 sec
> run-time memory: 142kb
>
>
> 2. clang:
>
> source object files size: 6,5G.
> fragmented source object files size: 7,3G(13% increase).
>
> a. upstream version,
> command line options: --gc-sections
> binary size: 1,5G
> compilation time: 6 sec
> run-time memory: 9.7G
>
> b. "fragmented DWARF" version,
> command line options: --gc-sections
> binary size: 1,4G
> compilation time: 8 sec
> run-time memory: 12G
>
> c. DWARFLinker version,
> command line options: --gc-sections --gc-debuginfo
> binary size: 836M
> compilation time: 62 sec
> run-time memory: 15G
>
> d. DWARFLinker no-odr version,
> command line options: --gc-sections --gc-debuginfo
> --gc-debuginfo-no-odr
> binary size: 1,3G
> compilation time: 128 sec
> run-time memory: 17G
>
> Detailed size results:
>
> 1. a)
>
> FILE SIZE VM SIZE
> -------------- --------------
> 41.1% 2.64Mi 0.0% 0 .debug_info
> 24.9% 1.60Mi 0.0% 0 .debug_str
> 12.6% 827Ki 0.0% 0 .debug_line
> 6.5% 428Ki 63.8% 428Ki .text
> 4.8% 317Ki 0.0% 0 .strtab
> 3.4% 223Ki 0.0% 0 .debug_ranges
> 2.0% 133Ki 19.8% 133Ki .eh_frame
> 1.7% 110Ki 0.0% 0 .symtab
> 1.2% 77.6Ki 0.0% 0 .debug_abbrev
>
> b)
>
> FILE SIZE VM SIZE
> -------------- --------------
> 40.2% 2.10Mi 0.0% 0 .debug_info
> 30.7% 1.60Mi 0.0% 0 .debug_str
> 8.0% 428Ki 63.8% 428Ki .text
> 5.9% 317Ki 0.0% 0 .strtab
> 5.9% 313Ki 0.0% 0 .debug_line
> 2.5% 133Ki 19.8% 133Ki .eh_frame
> 2.1% 110Ki 0.0% 0 .symtab
> 1.5% 77.6Ki 0.0% 0 .debug_abbrev
> 1.3% 69.2Ki 0.0% 0 .debug_ranges
>
> c)
>
> FILE SIZE VM SIZE
> -------------- --------------
> 33.0% 1.25Mi 0.0% 0 .debug_info
> 29.2% 1.11Mi 0.0% 0 .debug_str
> 11.0% 428Ki 63.8% 428Ki .text
> 8.2% 317Ki 0.0% 0 .strtab
> 7.8% 304Ki 0.0% 0 .debug_line
> 3.4% 133Ki 19.8% 133Ki .eh_frame
> 2.8% 110Ki 0.0% 0 .symtab
> 1.7% 65.9Ki 0.0% 0 .debug_ranges
> 1.0% 38.4Ki 5.7% 38.4Ki .rodata
>
> d)
>
> FILE SIZE VM SIZE
> -------------- --------------
> 39.7% 1.68Mi 0.0% 0 .debug_info
> 26.3% 1.11Mi 0.0% 0 .debug_str
> 9.9% 428Ki 63.8% 428Ki .text
> 7.3% 317Ki 0.0% 0 .strtab
> 7.0% 304Ki 0.0% 0 .debug_line
> 3.1% 133Ki 19.8% 133Ki .eh_frame
> 2.6% 110Ki 0.0% 0 .symtab
> 1.5% 65.9Ki 0.0% 0 .debug_ranges
>
>
> 2. a)
>
> FILE SIZE VM SIZE
> -------------- --------------
> 58.3% 878Mi 0.0% 0 .debug_info
> 11.8% 177Mi 0.0% 0 .debug_str
> 7.7% 115Mi 62.2% 115Mi .text
> 7.7% 115Mi 0.0% 0 .debug_line
> 6.0% 90.7Mi 0.0% 0 .strtab
> 2.4% 35.4Mi 0.0% 0 .debug_ranges
> 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame
> 1.5% 23.0Mi 12.4% 23.0Mi .rodata
> 1.2% 17.9Mi 0.0% 0 .symtab
>
> b)
>
> FILE SIZE VM SIZE
> -------------- --------------
> 59.6% 807Mi 0.0% 0 .debug_info
> 13.1% 177Mi 0.0% 0 .debug_str
> 8.5% 115Mi 62.2% 115Mi .text
> 6.7% 90.7Mi 0.0% 0 .strtab
> 4.2% 57.4Mi 0.0% 0 .debug_line
> 1.7% 23.3Mi 12.5% 23.3Mi .eh_frame
> 1.7% 23.0Mi 12.4% 23.0Mi .rodata
> 1.3% 17.9Mi 0.0% 0 .symtab
> 1.0% 13.0Mi 0.0% 0 .debug_ranges
> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr
>
> c)
>
> FILE SIZE VM SIZE
> -------------- --------------
> 35.1% 293Mi 0.0% 0 .debug_info
> 21.2% 177Mi 0.0% 0 .debug_str
> 13.9% 115Mi 62.2% 115Mi .text
> 10.9% 90.7Mi 0.0% 0 .strtab
> 6.9% 57.4Mi 0.0% 0 .debug_line
> 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame
> 2.8% 23.0Mi 12.4% 23.0Mi .rodata
> 2.1% 17.9Mi 0.0% 0 .symtab
> 1.5% 12.4Mi 0.0% 0 .debug_ranges
> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr
>
> d)
>
> FILE SIZE VM SIZE
> -------------- --------------
> 58.3% 758Mi 0.0% 0 .debug_info
> 13.6% 177Mi 0.0% 0 .debug_str
> 8.9% 115Mi 62.2% 115Mi .text
> 7.0% 90.7Mi 0.0% 0 .strtab
> 4.4% 57.4Mi 0.0% 0 .debug_line
> 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame
> 1.8% 23.0Mi 12.4% 23.0Mi .rodata
> 1.4% 17.9Mi 0.0% 0 .symtab
> 1.0% 12.4Mi 0.0% 0 .debug_ranges
> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr
>
>
>>
>> I'm still trying to figure out the problems on my end to try
>> running your experiment on the game package I used in my
>> presentation, but have been interrupted by other unrelated
>> issues. I'll try to get back to this in the coming days.
>>
>> James
>>
>> On Wed, 4 Nov 2020 at 11:54, Alexey Lapshin
>> <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote:
>>
>> Hi James,
>>
>> I did experiments with the clang code base and will do
>> experiments with our local codebase later.
>> Overall, both solutions("Fragmented DWARF" and "DWARFLinker
>> without odr types deduplication") look having similar size
>> savings results for the final binary. "DWARFLinker with odr
>> types deduplication" has a bigger size saving effect.
>> "Fragmented DWARF" increases the size of original object
>> files up to 15%.
>> LLD with "fragmented DWARF" works significantly faster than
>> with "DWARFLinker".
>>
>> Following are the results for "llvm-strings" and "clang"
>> binaries:
>>
>> 1. llvm-strings:
>>
>> source object files size: 381M.
>> fragmented source object files size: 451M(18% increase).
>>
>> a. upstream version,
>> command line options: --gc-sections
>> binary size: 6,5M
>> compilation time: 0:00.13 sec
>> run-time memory: 111kb
>>
>> b. "fragmented DWARF" version,
>> command line options: --gc-sections --mark-live-pc=0.45
>> binary size: 3,7M
>> compilation time: 0:00.10 sec
>> run-time memory: 122kb
>>
>> c. DWARFLinker version,
>> command line options: --gc-sections --gc-debuginfo
>> binary size: 3,8M
>> compilation time: 0:00.33 sec
>> run-time memory: 141kb
>>
>> d. DWARFLinker no-odr version,
>> command line options: --gc-sections --gc-debuginfo
>> --gc-debuginfo-no-odr
>> binary size: 4,3M
>> compilation time: 0:00.38 sec
>> run-time memory: 142kb
>>
>>
>> 2. clang:
>>
>> source object files size: 6,5G.
>> fragmented source object files size: 7,3G(13% increase).
>>
>> a. upstream version,
>> command line options: --gc-sections
>> binary size: 1,5G
>> compilation time: 6 sec
>> run-time memory: 9.7G
>>
>> b. "fragmented DWARF" version,
>> command line options: --gc-sections --mark-live-pc=0.43
>> binary size: 1,1G
>> compilation time: 9 sec
>> run-time memory: 11G
>>
>> c. DWARFLinker version,
>> command line options: --gc-sections --gc-debuginfo
>> binary size: 836M
>> compilation time: 62 sec
>> run-time memory: 15G
>>
>> d. DWARFLinker no-odr version,
>> command line options: --gc-sections --gc-debuginfo
>> --gc-debuginfo-no-odr
>> binary size: 1,3G
>> compilation time: 128 sec
>> run-time memory: 17G
>>
>> Detailed size results:
>>
>> 1. llvm-strings
>>
>> a)
>>
>> FILE SIZE VM SIZE
>> -------------- --------------
>> 41.1% 2.64Mi 0.0% 0 .debug_info
>> 24.9% 1.60Mi 0.0% 0 .debug_str
>> 12.6% 827Ki 0.0% 0 .debug_line
>> 6.5% 428Ki 63.8% 428Ki .text
>> 4.8% 317Ki 0.0% 0 .strtab
>> 3.4% 223Ki 0.0% 0 .debug_ranges
>> 2.0% 133Ki 19.8% 133Ki .eh_frame
>> 1.7% 110Ki 0.0% 0 .symtab
>> 1.2% 77.6Ki 0.0% 0 .debug_abbrev
>>
>> b)
>>
>> FILE SIZE VM SIZE
>> -------------- --------------
>> 50.3% 1.85Mi 0.0% 0 .debug_info
>> 43.6% 1.60Mi 0.0% 0 .debug_str
>> 2.6% 98.2Ki 0.0% 0 .debug_line
>> 2.1% 77.6Ki 0.0% 0 .debug_abbrev
>> 0.5% 17.5Ki 54.9% 17.4Ki .text
>> 0.3% 9.94Ki 0.0% 0 .strtab
>> 0.2% 6.27Ki 0.0% 0 .symtab
>> 0.1% 5.09Ki 15.9% 5.03Ki .eh_frame
>> 0.1% 3.28Ki 0.0% 0 .debug_ranges
>>
>> c)
>>
>> FILE SIZE VM SIZE
>> -------------- --------------
>> 33.0% 1.25Mi 0.0% 0 .debug_info
>> 29.2% 1.11Mi 0.0% 0 .debug_str
>> 11.0% 428Ki 63.8% 428Ki .text
>> 8.2% 317Ki 0.0% 0 .strtab
>> 7.8% 304Ki 0.0% 0 .debug_line
>> 3.4% 133Ki 19.8% 133Ki .eh_frame
>> 2.8% 110Ki 0.0% 0 .symtab
>> 1.7% 65.9Ki 0.0% 0 .debug_ranges
>> 1.0% 38.4Ki 5.7% 38.4Ki .rodata
>>
>> d)
>>
>> FILE SIZE VM SIZE
>> -------------- --------------
>> 39.7% 1.68Mi 0.0% 0 .debug_info
>> 26.3% 1.11Mi 0.0% 0 .debug_str
>> 9.9% 428Ki 63.8% 428Ki .text
>> 7.3% 317Ki 0.0% 0 .strtab
>> 7.0% 304Ki 0.0% 0 .debug_line
>> 3.1% 133Ki 19.8% 133Ki .eh_frame
>> 2.6% 110Ki 0.0% 0 .symtab
>> 1.5% 65.9Ki 0.0% 0 .debug_ranges
>>
>>
>> 2. clang
>>
>> a)
>>
>> FILE SIZE VM SIZE
>> -------------- --------------
>> 58.3% 878Mi 0.0% 0 .debug_info
>> 11.8% 177Mi 0.0% 0 .debug_str
>> 7.7% 115Mi 62.2% 115Mi .text
>> 7.7% 115Mi 0.0% 0 .debug_line
>> 6.0% 90.7Mi 0.0% 0 .strtab
>> 2.4% 35.4Mi 0.0% 0 .debug_ranges
>> 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame
>> 1.5% 23.0Mi 12.4% 23.0Mi .rodata
>> 1.2% 17.9Mi 0.0% 0 .symtab
>>
>> b)
>>
>> FILE SIZE VM SIZE
>> -------------- --------------
>> 71.5% 772Mi 0.0% 0 .debug_info
>> 16.5% 177Mi 0.0% 0 .debug_str
>> 3.7% 40.2Mi 59.2% 40.2Mi .text
>> 2.4% 25.8Mi 0.0% 0 .debug_line
>> 2.1% 23.0Mi 0.0% 0 .strtab
>> 1.0% 10.6Mi 15.6% 10.6Mi .dynstr
>> 0.7% 7.18Mi 10.6% 7.18Mi .eh_frame
>> 0.5% 5.60Mi 0.0% 0 .symtab
>> 0.4% 4.28Mi 0.0% 0 .debug_ranges
>> 0.4% 4.04Mi 0.0% 0 .debug_abbrev
>>
>>
>> c)
>>
>> FILE SIZE VM SIZE
>> -------------- --------------
>> 35.1% 293Mi 0.0% 0 .debug_info
>> 21.2% 177Mi 0.0% 0 .debug_str
>> 13.9% 115Mi 62.2% 115Mi .text
>> 10.9% 90.7Mi 0.0% 0 .strtab
>> 6.9% 57.4Mi 0.0% 0 .debug_line
>> 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame
>> 2.8% 23.0Mi 12.4% 23.0Mi .rodata
>> 2.1% 17.9Mi 0.0% 0 .symtab
>> 1.5% 12.4Mi 0.0% 0 .debug_ranges
>> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr
>>
>> d)
>>
>> FILE SIZE VM SIZE
>> -------------- --------------
>> 58.3% 758Mi 0.0% 0 .debug_info
>> 13.6% 177Mi 0.0% 0 .debug_str
>> 8.9% 115Mi 62.2% 115Mi .text
>> 7.0% 90.7Mi 0.0% 0 .strtab
>> 4.4% 57.4Mi 0.0% 0 .debug_line
>> 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame
>> 1.8% 23.0Mi 12.4% 23.0Mi .rodata
>> 1.4% 17.9Mi 0.0% 0 .symtab
>> 1.0% 12.4Mi 0.0% 0 .debug_ranges
>> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr
>>
>> Thank you, Alexey.
>>
>> On 19.10.2020 11:50, James Henderson wrote:
>>> Great, thanks Alexey! I'll try to take a look at this in the
>>> near future, and will report my results back here. I imagine
>>> our clang results will differ, purely because we probably
>>> used different toolchains to build the input in the first place.
>>>
>>> On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin
>>> <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote:
>>>
>>>
>>> On 13.10.2020 10:20, James Henderson wrote:
>>>> The script included in the patch can be used to convert
>>>> an object containing normal DWARF into an object using
>>>> fragmented DWARF. It does this by using llvm-dwarfdump
>>>> to dump the various sections, parses the output to
>>>> identify where it should split (using the offsets of
>>>> the various entries), and then writes new section
>>>> headers accordingly - you can see roughly what it's
>>>> doing if you get a chance to watch the talk recording.
>>>> The additional section headers are appended to the end
>>>> of the ELF section header table, whilst the original
>>>> DWARF is left in the same place it was before (making
>>>> use of the fact that section headers don't have to
>>>> appear in offset order). The script also parses and
>>>> fragments the relocation sections targeting the DWARF
>>>> sections so that they match up with the fragmented
>>>> DWARF sections. This is clearly all suboptimal - in
>>>> practice the compiler should be modified to do the
>>>> fragmenting upfront, to save having to parse a tool's
>>>> stdout, but that was just the simplest thing I could
>>>> come up with to quickly write the script. Full details
>>>> of the script usage are included in the patch
>>>> description, if you want to play around with it.
>>>>
>>>> If Alexey could point me at the latest version of his
>>>> patch, I'd be happy to run that through either or both
>>>> of the packages I used to see what happens. Equally,
>>>> I'd be happy if Alexey is able to run my script to
>>>> fragment and measure the performance of a couple of
>>>> projects he's been working with. Based purely on the
>>>> two packages I've tried this with, I can tell already
>>>> that the results can vary wildly. My expectation is
>>>> that Alexey's approach will be slower (at least in its
>>>> current form, but probably more generally), but produce
>>>> smaller output, but to what scale I have no idea.
>>>
>>> James, I updated the patch -
>>> https://reviews.llvm.org/D74169.
>>>
>>> To make it working it is necessary to build example with
>>> -ffunction-sections and specify following options to the
>>> linker :
>>>
>>> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>>>
>>> For clang binary I got following results:
>>>
>>> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>>>
>>> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x
>>> performance decrease, Debug Info size 542M
>>>
>>> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr =
>>> binary size 1,3G, 16x performance decrease, Debug Info
>>> size 1G
>>>
>>> (*)
>>> .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>>>
>>>
>>> I added option --gc-debuginfo-no-odr, so that size
>>> reduction could be compared correctly. Without that
>>> option D74169 does types deduplication and then it is
>>> not correct to compare resulting size with "Fragmented
>>> DWARF" solution which does not do types deduplication.
>>>
>>> Also, I look at your D89229
>>> <https://reviews.llvm.org/D89229> and would share
>>> results some time later.
>>>
>>> Thank you, Alexey.
>>>
>>>>
>>>> I think linkers parse .eh_frame partly because they
>>>> have no other choice. That being said, I think it's
>>>> format is not too complex, so similarly the parser
>>>> isn't too complex. You can see LLD's ELF implementation
>>>> in ELF/EhFrame.cpp, how it is used in
>>>> ELF/InputSection.cpp (see the bits to do with
>>>> EhInputSection) and EhFrameSection in
>>>> ELF/SyntheticSections.h (plus various usages of these
>>>> two throughout the LLD code). I think the key to any
>>>> structural changes in the DWARF format to make them
>>>> more amenable to link-time parsing is being able to
>>>> read a minimal amount without needing to parse the
>>>> payload (e.g. a length field, some sort of type, and
>>>> then using the relocations to associate it accordingly).
>>>>
>>>> James
>>>>
>>>> On Mon, 12 Oct 2020 at 20:48, David Blaikie
>>>> <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>
>>>> Awesome! Sorry I missed the lightning talk, but
>>>> really interested to see this sort of thing (though
>>>> it's not directly/immediately applicable to the use
>>>> case I work with - Split DWARF, something similar
>>>> could be used there with further work)
>>>>
>>>> Though it looks like the patch has mostly linker
>>>> changes - where/how do you generate the fragmented
>>>> DWARF to begin with? Via the Python script? Run
>>>> over assembly? I'd be surprised if it was
>>>> achievable that way - curious to know more.
>>>>
>>>> Got a rough sense/are you able to run
>>>> apples-to-apples comparisons with Alexey's
>>>> linker-based patches to compare linker time/memory
>>>> overhead versus resulting output size gains?
>>>>
>>>> (& yeah, I'm a bit curious about how the linkers do
>>>> eh_frame rewriting, if the format is especially
>>>> amenable to a lightweight parsing/rewriting and how
>>>> we could make the DWARF more amenable to that too)
>>>>
>>>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson
>>>> <jh7370.2008 at my.bristol.ac.uk
>>>> <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> At the recent LLVM developers' meeting, I
>>>> presented a lightning talk on an approach to
>>>> reduce the amount of dead debug data left in an
>>>> executable following operations such as
>>>> --gc-sections and duplicate COMDAT removal. In
>>>> that presentation, I presented some figures
>>>> based on linking a game that had been built by
>>>> our downstream clang port and fragmented using
>>>> the described approach. Since recording the
>>>> presentation, I ran the same experiment on a
>>>> clang package (this time built with a GCC
>>>> version). The comparable figures are below:
>>>>
>>>> Link-time speed (s):
>>>> +--------------------+-------+---------------+------+------+------+------+------+
>>>> | Package variant | No GC | GC 1 (normal) |
>>>> GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
>>>> +--------------------+-------+---------------+------+------+------+------+------+
>>>> | Game (plain) | 4.5 | 4.9 |
>>>> 4.2 | 3.6 | 3.4 | 3.3 | 3.2 |
>>>> | Game (fragmented) | 11.1 | 11.8 |
>>>> 9.7 | 8.6 | 7.9 | 7.7 | 7.5 |
>>>> | Clang (plain) | 13.9 | 17.9 |
>>>> 17.0 | 16.7 | 16.3 | 16.2 | 16.1 |
>>>> | Clang (fragmented) | 18.6 | 22.8 |
>>>> 21.6 | 21.1 | 20.8 | 20.5 | 20.2 |
>>>> +--------------------+-------+---------------+------+------+------+------+------+
>>>>
>>>> Output size - Game package (MB):
>>>> +---------------------+-------+------+------+------+------+------+------+
>>>> | Category | No GC | GC 1 | GC 2 |
>>>> GC 3 | GC 4 | GC 5 | GC 6 |
>>>> +---------------------+-------+------+------+------+------+------+------+
>>>> | Plain (total) | 1149 | 1121 | 1017 |
>>>> 965 | 938 | 930 | 928 |
>>>> | Plain (DWARF*) | 845 | 845 | 845 |
>>>> 845 | 845 | 845 | 845 |
>>>> | Plain (other) | 304 | 276 | 172 |
>>>> 120 | 93 | 85 | 82 |
>>>> | Fragmented (total) | 1044 | 940 | 556 |
>>>> 373 | 287 | 263 | 255 |
>>>> | Fragmented (DWARF*) | 740 | 664 | 384 |
>>>> 253 | 194 | 178 | 173 |
>>>> | Fragmented (other) | 304 | 276 | 172 |
>>>> 120 | 93 | 85 | 82 |
>>>> +---------------------+-------+------+------+------+------+------+------+
>>>>
>>>>
>>>> Output size - Clang (MB):
>>>> +---------------------+-------+------+------+------+------+------+------+
>>>> | Category | No GC | GC 1 | GC 2 |
>>>> GC 3 | GC 4 | GC 5 | GC 6 |
>>>> +---------------------+-------+------+------+------+------+------+------+
>>>> | Plain (total) | 2596 | 2546 | 2406 |
>>>> 2332 | 2293 | 2273 | 2251 |
>>>> | Plain (DWARF*) | 1979 | 1979 | 1979 |
>>>> 1979 | 1979 | 1979 | 1979 |
>>>> | Plain (other) | 616 | 567 | 426 |
>>>> 353 | 314 | 294 | 272 |
>>>> | Fragmented (total) | 2397 | 2346 | 2164 |
>>>> 2069 | 2017 | 1990 | 1963 |
>>>> | Fragmented (DWARF*) | 1780 | 1780 | 1738 |
>>>> 1716 | 1703 | 1696 | 1691 |
>>>> | Fragmented (other) | 616 | 567 | 426 |
>>>> 353 | 314 | 294 | 272 |
>>>> +---------------------+-------+------+------+------+------+------+------+
>>>>
>>>> *DWARF size == total size of .debug_info +
>>>> .debug_line + .debug_ranges + .debug_aranges +
>>>> .debug_loc
>>>>
>>>> Additionally, I have posted
>>>> https://reviews.llvm.org/D89229 which provides
>>>> the python script and linker patches used to
>>>> reproduce the above results on my machine. The
>>>> GC 1/2/3/4/5/6 correspond to the linker option
>>>> added in that patch --mark-live-pc with values
>>>> 1/0.8/0.6/0.4/0.2/0 respectively.
>>>>
>>>> During the conference, the question was asked
>>>> what the memory usage and input size impact
>>>> was. I've summarised these below:
>>>>
>>>> Input file size total (GB):
>>>> +--------------------+------------+
>>>> | Package variant | Total Size |
>>>> +--------------------+------------+
>>>> | Game (plain) | 2.9 |
>>>> | Game (fragmented) | 4.2 |
>>>> | Clang (plain) | 10.9 |
>>>> | Clang (fragmented) | 12.3 |
>>>> +--------------------+------------+
>>>>
>>>> Peak Working Set Memory usage (GB):
>>>> +--------------------+-------+------+
>>>> | Package variant | No GC | GC 1 |
>>>> +--------------------+-------+------+
>>>> | Game (plain) | 4.3 | 4.7 |
>>>> | Game (fragmented) | 8.9 | 8.6 |
>>>> | Clang (plain) | 15.7 | 15.6 |
>>>> | Clang (fragmented) | 19.4 | 19.2 |
>>>> +--------------------+-------+------+
>>>>
>>>> I'm keen to hear what people's feedback is, and
>>>> also interested to see what results others
>>>> might see by running this experiment on other
>>>> input packages. Also, if anybody has any
>>>> alternative ideas that meet the goals listed
>>>> below, I'd love to hear them!
>>>>
>>>> To reiterate some key goals of fragmented
>>>> DWARF, similar to what I said in the presentation:
>>>> 1) Devise a scheme that gives significant size
>>>> savings without being too costly. It's clear
>>>> from just the two packages I've tried this on
>>>> that there is a fairly hefty link time
>>>> performance cost, although the exact cost
>>>> depends on the nature of the input package. On
>>>> the other hand, depending on the nature of the
>>>> input package, there can also be some big gains.
>>>> 2) Devise a scheme that doesn't require any
>>>> linker knowledge of DWARF. The current approach
>>>> doesn't quite achieve this properly due to the
>>>> slight misuse of SHF_LINK_ORDER, but I expect
>>>> that a pivot to using non-COMDAT group sections
>>>> should solve this problem.
>>>> 3) Provide some kind of halfway house between
>>>> simply writing tombstone values into dead DWARF
>>>> and fully parsing the DWARF to reoptimise
>>>> its/discard the dead bits.
>>>>
>>>> I'm hopeful that changes could be made to the
>>>> linker to improve the link-time cost. There
>>>> seems to be a significant amount of the link
>>>> time spent creating the input sections. An
>>>> alternative would be to devise a scheme that
>>>> would avoid the literal splitting into section
>>>> headers, in favour of some sort of list of
>>>> split-points that the linker uses to split
>>>> things up (a bit like it already does for
>>>> .eh_frame or mergeable sections).
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201105/cbfb6511/attachment-0001.html>
More information about the llvm-dev
mailing list