<div dir="ltr"><div class="gmail_extra"><div><div class="gmail_signature" data-smartmail="gmail_signature">On Thu, Feb 8, 2018 at 10:41 AM, Rafael Avila de Espindola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Michael Spencer <<a href="mailto:bigcheesegs@gmail.com">bigcheesegs@gmail.com</a>> writes:<br>

<br>

> On Tue, Feb 6, 2018 at 6:53 PM, Rafael Avila de Espindola <<br>

> <a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> wrote:<br>

><br>

>> I have benchmarked this by timing lld ltoing FileCheck. The working set<br>

>> is much larger this time. The old callgraph had 4079 calls, this one has<br>

>> 30616.<br>

>><br>

>> The results are somewhat similar:<br>

>><br>

>>  Performance counter stats for '../default-ld.lld @response.txt' (10 runs):<br>

>><br>

>>            498,771      iTLB-load-misses<br>

>>             ( +-  0.10% )<br>

>>        224,751,360      L1-icache-load-misses<br>

>>            ( +-  0.00% )<br>

>><br>

>>        2.339864606 seconds time elapsed<br>

>>       ( +-  0.06% )<br>

>><br>

>>  Performance counter stats for '../sorted-ld.lld @response.txt' (10 runs):<br>

>><br>

>>            556,999      iTLB-load-misses<br>

>>             ( +-  0.17% )<br>

>>        216,788,838      L1-icache-load-misses<br>

>>            ( +-  0.01% )<br>

>><br>

>>        2.326596163 seconds time elapsed<br>

>>       ( +-  0.04% )<br>

>><br>

>> As with the previous test iTLB gets worse and L1 gets better. The net<br>

>> result is a very small speedup.<br>

>><br>

>> Do you know how big the chromium call graph is?<br>

>><br>

><br>

> Not sure, but the call graph for a high profile internal game I tested is<br>

> about 10k functions and 17 MiB of .text, and I got a %2-%4 speedup.  Given<br>

> that it's a game it runs a decent portion of that 17MiB 60 times a second,<br>

> while llvm is heavily pass based, so I don't expect the instruction working<br>

> set over a small period of time to be that high.<br>

<br>

</div></div>One difference from the paper and the script I am using to create the<br>

call graph is that the script I have records every call the exact number<br>

of times. The script is attached.<br>

<br>

With sampling, a call foo->long_running_bar would be recorded multiple<br>

times and show up as multiple calls.<br>

<br>

The first seems better, but I wonder if sampling somehow produces a<br>

better result.<br>

<br>

With instrumentation (which I assume is what you used in the game), you<br>

also get an exact callgraph, no?<br></blockquote><div><br></div><div>You get an exact callgraph minus indirect calls as those currently aren't captured.</div><div><br></div><div>


<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">- Michael Spencer</span>


<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5"><br>

><br>

> I am however surprised by the 10% increase in iTLB misses.<br>

<br>

<br>

</div></div><br><br>

Cheers,<br>

Rafael<br>

<br></blockquote></div><br></div></div>