[PATCH] D36351: [lld][ELF] Add profile guided section layout
Rafael Avila de Espindola via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 8 16:04:50 PST 2018
Looking a bit more on why I might not be measuring a performance
improvement I noticed that the callgraph I was using was missing
conditional calls. The attached script fixes that.
I have uploaded a new version of the test with the complete call graph
to https://s3-us-west-2.amazonaws.com/linker-tests/t2.tar.xz.
I also noticed that we were not considering the case of multiple symbols
in the same section. The attached patch fixes that.
Even with these changes I still get a iTLB regression.
I am now going to try building hfsort and compare its results.
Please upload a new patch on top of tree and include the attached fixes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: get-call-graph.py
Type: application/octet-stream
Size: 1347 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180208/da95f323/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t.diff
Type: text/x-patch
Size: 637 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180208/da95f323/attachment.bin>
-------------- next part --------------
Thanks,
Rafael
Michael Spencer <bigcheesegs at gmail.com> writes:
> On Thu, Feb 8, 2018 at 10:41 AM, Rafael Avila de Espindola <
> rafael.espindola at gmail.com> wrote:
>
>> Michael Spencer <bigcheesegs at gmail.com> writes:
>>
>> > On Tue, Feb 6, 2018 at 6:53 PM, Rafael Avila de Espindola <
>> > rafael.espindola at gmail.com> wrote:
>> >
>> >> I have benchmarked this by timing lld ltoing FileCheck. The working set
>> >> is much larger this time. The old callgraph had 4079 calls, this one has
>> >> 30616.
>> >>
>> >> The results are somewhat similar:
>> >>
>> >> Performance counter stats for '../default-ld.lld @response.txt' (10
>> runs):
>> >>
>> >> 498,771 iTLB-load-misses
>> >> ( +- 0.10% )
>> >> 224,751,360 L1-icache-load-misses
>> >> ( +- 0.00% )
>> >>
>> >> 2.339864606 seconds time elapsed
>> >> ( +- 0.06% )
>> >>
>> >> Performance counter stats for '../sorted-ld.lld @response.txt' (10
>> runs):
>> >>
>> >> 556,999 iTLB-load-misses
>> >> ( +- 0.17% )
>> >> 216,788,838 L1-icache-load-misses
>> >> ( +- 0.01% )
>> >>
>> >> 2.326596163 seconds time elapsed
>> >> ( +- 0.04% )
>> >>
>> >> As with the previous test iTLB gets worse and L1 gets better. The net
>> >> result is a very small speedup.
>> >>
>> >> Do you know how big the chromium call graph is?
>> >>
>> >
>> > Not sure, but the call graph for a high profile internal game I tested is
>> > about 10k functions and 17 MiB of .text, and I got a %2-%4 speedup.
>> Given
>> > that it's a game it runs a decent portion of that 17MiB 60 times a
>> second,
>> > while llvm is heavily pass based, so I don't expect the instruction
>> working
>> > set over a small period of time to be that high.
>>
>> One difference from the paper and the script I am using to create the
>> call graph is that the script I have records every call the exact number
>> of times. The script is attached.
>>
>> With sampling, a call foo->long_running_bar would be recorded multiple
>> times and show up as multiple calls.
>>
>> The first seems better, but I wonder if sampling somehow produces a
>> better result.
>>
>> With instrumentation (which I assume is what you used in the game), you
>> also get an exact callgraph, no?
>>
>
> You get an exact callgraph minus indirect calls as those currently aren't
> captured.
>
> - Michael Spencer
>
>
>>
>> >
>> > I am however surprised by the 10% increase in iTLB misses.
>>
>>
>>
>>
>> Cheers,
>> Rafael
>>
>>
More information about the llvm-commits
mailing list