<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 28, 2017 at 12:10 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I don't think getVA is particularly expensive, and if it is not expensive I wouldn't cache its result. Did you experiment to cache getVA results? I think you can do that fairly easily by adding a std::atomic_uint64_t to SymbolBody and use it as a cache for getVA.</div></blockquote><div><br></div><div><br></div><div>You're right, caching it didn't have any significant effect (though I wasn't measuring super precisely ). I think I was remembering the profile wrong. I remember measuring that we had some very bad cache/TLB misses here, but I guess those aren't too important on the current profile (at least, not on this test case; the locality of these accesses depends a lot on the test case).</div><div><br></div><div>Also, it seems like our performance is a lot more stable w.r.t. InputSectionBase::relocate than it used to be (or maybe my current CPU is just less affected; it's a desktop class processor instead of a xeon).</div><div><br></div><div><br></div><div>I took a quick profile of this workload and it looks like it is:</div><div><br></div><div>65% in the writer ("backend")<br></div><div>30% in the "frontend" (everything called by SymbolTable::addFile)</div><div><br></div><div>The frontend work seems to be largely dominated by ObjectFile::parse (as you would expect), though there is about 10% of total runtime slipping through the cracks here in various other "frontend" tasks.</div><div><br></div><div>The backend work is split about evenly between scanRelocations and OutputSection::writeTo. InputSectionBase::relocate is only about 10% of the total runtime (part of OutputSection::writeTo).</div><div><br></div><div>Some slightly cleaned up `perf report` output with some more details:</div><div><a href="https://reviews.llvm.org/P7972">https://reviews.llvm.org/P7972</a></div><div><br></div><div>So it seems like overall, the profile is basically split 3 ways (about 30% each):</div><div>- frontend (reading input files and building the symbol table and associated data structures)</div><div>- scanRelocations (initial pass over relocations)</div><div>- writeTo (mostly IO and InputSectionBase::relocate)</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 28, 2017 at 4:19 AM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>tl;dr: it looks like we call SymbolBody::getVA about 5x more times than we need to</div><div><br></div><div>Should we cache it  or something? (careful with threads).</div><div><br></div><div><div><br class="gmail-m_5524824443263171716m_-3320630192176821596gmail-Apple-interchange-newline">Here is a link to a PDF of my Mathematica notebook which has all the details of my investigation:</div><div><a href="https://drive.google.com/open?id=0B8v10qJ6EXRxVDQ3YnZtUlFtZ1k" target="_blank">https://drive.google.com/open?<wbr>id=0B8v10qJ6EXRxVDQ3YnZtUlFtZ1<wbr>k</a></div></div><div><br></div><div><br></div><div>There seem to be two main regimes that we redundantly call SymbolBody::getVA:</div><div><br></div><div>1. most redundant calls on the same symbol (about 80%) happen in quick succession with few intervening calls for other symbols. Most likely we are processing a bunch of relocations right next to each other that all refer to the same symbol (or small set of symbols); e.g. within a TU</div><div><br></div><div>2. there is a long-ish tail (about 20% of calls to SymbolBody::getVA) which happen at a long temporal distance from any previous call to SymbolBody::getVA on the same symbol. I don't know off the top of my head where these are coming from, but it doesn't sound like relocations. A quick grepping shows a bunch of source locations that match getVA, so it's hard at a glance to see. Any ideas where these other calls are coming from?</div><div><br></div><div>The particular link I was looking at was a release without debug info link, using `-O0 --no-gc-sections --no-threads`. The particular test case is LLD itself.</div><span class="gmail-m_5524824443263171716HOEnZb"><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div></div>