[PATCH] D36351: [lld][ELF] Add profile guided section layout

Thu Feb 8 10:41:51 PST 2018

Michael Spencer <bigcheesegs at gmail.com> writes:

> On Tue, Feb 6, 2018 at 6:53 PM, Rafael Avila de Espindola <
> rafael.espindola at gmail.com> wrote:
>
>> I have benchmarked this by timing lld ltoing FileCheck. The working set
>> is much larger this time. The old callgraph had 4079 calls, this one has
>> 30616.
>>
>> The results are somewhat similar:
>>
>>  Performance counter stats for '../default-ld.lld @response.txt' (10 runs):
>>
>>            498,771      iTLB-load-misses
>>             ( +-  0.10% )
>>        224,751,360      L1-icache-load-misses
>>            ( +-  0.00% )
>>
>>        2.339864606 seconds time elapsed
>>       ( +-  0.06% )
>>
>>  Performance counter stats for '../sorted-ld.lld @response.txt' (10 runs):
>>
>>            556,999      iTLB-load-misses
>>             ( +-  0.17% )
>>        216,788,838      L1-icache-load-misses
>>            ( +-  0.01% )
>>
>>        2.326596163 seconds time elapsed
>>       ( +-  0.04% )
>>
>> As with the previous test iTLB gets worse and L1 gets better. The net
>> result is a very small speedup.
>>
>> Do you know how big the chromium call graph is?
>>
>
> Not sure, but the call graph for a high profile internal game I tested is
> about 10k functions and 17 MiB of .text, and I got a %2-%4 speedup.  Given
> that it's a game it runs a decent portion of that 17MiB 60 times a second,
> while llvm is heavily pass based, so I don't expect the instruction working
> set over a small period of time to be that high.

One difference from the paper and the script I am using to create the
call graph is that the script I have records every call the exact number
of times. The script is attached.

With sampling, a call foo->long_running_bar would be recorded multiple
times and show up as multiple calls.

The first seems better, but I wonder if sampling somehow produces a
better result.

With instrumentation (which I assume is what you used in the game), you
also get an exact callgraph, no?

>
> I am however surprised by the 10% increase in iTLB misses.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: get-call-graph.py
Type: application/octet-stream
Size: 1239 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180208/26e0e231/attachment.obj>
-------------- next part --------------

Cheers,
Rafael