<div dir="ltr"><div class="gmail_extra"><div><div class="gmail_signature">On Thu, Feb 8, 2018 at 4:04 PM, Rafael Avila de Espindola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Looking a bit more on why I might not be measuring a performance<br>

improvement I noticed that the callgraph I was using was missing<br>

conditional calls. The attached script fixes that.<br>

<br>

I have uploaded a new version of the test with the complete call graph<br>

to <a href="https://s3-us-west-2.amazonaws.com/linker-tests/t2.tar.xz" rel="noreferrer" target="_blank">https://s3-us-west-2.<wbr>amazonaws.com/linker-tests/t2.<wbr>tar.xz</a>.<br>

<br>

I also noticed that we were not considering the case of multiple symbols<br>

in the same section. The attached patch fixes that.<br></blockquote><div><br></div><div>This is already handled in `CallGraphSort::CallGraphSort()` here:</div><div><br></div><div><div>    NodeIndex From = GetOrCreateNode(FromSB);</div><div>    NodeIndex To = GetOrCreateNode(ToSB);</div><div><br></div><div>    Nodes[To].Weight = SaturatingAdd(Nodes[To].Weight, Weight);</div><div><br></div><div>    if (>From == To)</div><div>      continue;</div></div><div><br></div><div>It's specifically done here after the node weight adjustment so that density calculation later on takes this into account.</div><div><br></div><div>


<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">- Michael Spencer</span>


<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Even with these changes I still get a iTLB regression.<br>

<br>

I am now going to try building hfsort and compare its results.<br>

<br>

Please upload a new patch on top of tree and include the attached fixes.<br>

<br>

<br>diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp<br>

index 1fc9a0ad5..a8d28de4c 100644<br>

--- a/ELF/Driver.cpp<br>

+++ b/ELF/Driver.cpp<br>

@@ -590,8 +590,12 @@ static void readCallGraph(MemoryBufferRef MB) {<br>

       fatal("parse error");<br>

     InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]<wbr>);<br>

     InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]<wbr>);<br>

-    if (FromSec && ToSec)<br>

-      Config->CallGraphProfile[std::<wbr>make_pair(FromSec, ToSec)] = Count;<br>

+    if (FromSec == ToSec)<br>

+      continue;<br>

+    if (FromSec && ToSec) {<br>

+      uint64_t &V = Config->CallGraphProfile[std::<wbr>make_pair(FromSec, ToSec)];<br>

+      V += Count;<br>

+    }<br>

   }<br>

 }<br>

<br>

<br><br>

Thanks,<br>

Rafael<br>

<br>

Michael Spencer <<a href="mailto:bigcheesegs@gmail.com">bigcheesegs@gmail.com</a>> writes:<br>

<br>

> On Thu, Feb 8, 2018 at 10:41 AM, Rafael Avila de Espindola <<br>

> <a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> wrote:<br>

><br>

>> Michael Spencer <<a href="mailto:bigcheesegs@gmail.com">bigcheesegs@gmail.com</a>> writes:<br>

>><br>

>> > On Tue, Feb 6, 2018 at 6:53 PM, Rafael Avila de Espindola <<br>

>> > <a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> wrote:<br>

>> ><br>

>> >> I have benchmarked this by timing lld ltoing FileCheck. The working set<br>

>> >> is much larger this time. The old callgraph had 4079 calls, this one has<br>

>> >> 30616.<br>

>> >><br>

>> >> The results are somewhat similar:<br>

>> >><br>

>> >>  Performance counter stats for '../default-ld.lld @response.txt' (10<br>

>> runs):<br>

>> >><br>

>> >>            498,771      iTLB-load-misses<br>

>> >>             ( +-  0.10% )<br>

>> >>        224,751,360      L1-icache-load-misses<br>

>> >>            ( +-  0.00% )<br>

>> >><br>

>> >>        2.339864606 seconds time elapsed<br>

>> >>       ( +-  0.06% )<br>

>> >><br>

>> >>  Performance counter stats for '../sorted-ld.lld @response.txt' (10<br>

>> runs):<br>

>> >><br>

>> >>            556,999      iTLB-load-misses<br>

>> >>             ( +-  0.17% )<br>

>> >>        216,788,838      L1-icache-load-misses<br>

>> >>            ( +-  0.01% )<br>

>> >><br>

>> >>        2.326596163 seconds time elapsed<br>

>> >>       ( +-  0.04% )<br>

>> >><br>

>> >> As with the previous test iTLB gets worse and L1 gets better. The net<br>

>> >> result is a very small speedup.<br>

>> >><br>

>> >> Do you know how big the chromium call graph is?<br>

>> >><br>

>> ><br>

>> > Not sure, but the call graph for a high profile internal game I tested is<br>

>> > about 10k functions and 17 MiB of .text, and I got a %2-%4 speedup.<br>

>> Given<br>

>> > that it's a game it runs a decent portion of that 17MiB 60 times a<br>

>> second,<br>

>> > while llvm is heavily pass based, so I don't expect the instruction<br>

>> working<br>

>> > set over a small period of time to be that high.<br>

>><br>

>> One difference from the paper and the script I am using to create the<br>

>> call graph is that the script I have records every call the exact number<br>

>> of times. The script is attached.<br>

>><br>

>> With sampling, a call foo->long_running_bar would be recorded multiple<br>

>> times and show up as multiple calls.<br>

>><br>

>> The first seems better, but I wonder if sampling somehow produces a<br>

>> better result.<br>

>><br>

>> With instrumentation (which I assume is what you used in the game), you<br>

>> also get an exact callgraph, no?<br>

>><br>

><br>

> You get an exact callgraph minus indirect calls as those currently aren't<br>

> captured.<br>

><br>

> - Michael Spencer<br>

><br>

><br>

>><br>

>> ><br>

>> > I am however surprised by the 10% increase in iTLB misses.<br>

>><br>

>><br>

>><br>

>><br>

>> Cheers,<br>

>> Rafael<br>

>><br>

>><br>

<br></blockquote></div><br></div></div>