[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...
Sean Silva via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 28 23:15:21 PST 2017
On Tue, Feb 28, 2017 at 12:10 PM, Rui Ueyama <ruiu at google.com> wrote:
> I don't think getVA is particularly expensive, and if it is not expensive
> I wouldn't cache its result. Did you experiment to cache getVA results? I
> think you can do that fairly easily by adding a std::atomic_uint64_t to
> SymbolBody and use it as a cache for getVA.
You're right, caching it didn't have any significant effect (though I
wasn't measuring super precisely ). I think I was remembering the profile
wrong. I remember measuring that we had some very bad cache/TLB misses
here, but I guess those aren't too important on the current profile (at
least, not on this test case; the locality of these accesses depends a lot
on the test case).
Also, it seems like our performance is a lot more stable w.r.t.
InputSectionBase::relocate than it used to be (or maybe my current CPU is
just less affected; it's a desktop class processor instead of a xeon).
I took a quick profile of this workload and it looks like it is:
65% in the writer ("backend")
30% in the "frontend" (everything called by SymbolTable::addFile)
The frontend work seems to be largely dominated by ObjectFile::parse (as
you would expect), though there is about 10% of total runtime slipping
through the cracks here in various other "frontend" tasks.
The backend work is split about evenly between scanRelocations and
OutputSection::writeTo. InputSectionBase::relocate is only about 10% of the
total runtime (part of OutputSection::writeTo).
Some slightly cleaned up `perf report` output with some more details:
So it seems like overall, the profile is basically split 3 ways (about 30%
- frontend (reading input files and building the symbol table and
associated data structures)
- scanRelocations (initial pass over relocations)
- writeTo (mostly IO and InputSectionBase::relocate)
-- Sean Silva
> On Tue, Feb 28, 2017 at 4:19 AM, Sean Silva <chisophugis at gmail.com> wrote:
>> tl;dr: it looks like we call SymbolBody::getVA about 5x more times than
>> we need to
>> Should we cache it or something? (careful with threads).
>> Here is a link to a PDF of my Mathematica notebook which has all the
>> details of my investigation:
>> There seem to be two main regimes that we redundantly call
>> 1. most redundant calls on the same symbol (about 80%) happen in quick
>> succession with few intervening calls for other symbols. Most likely we are
>> processing a bunch of relocations right next to each other that all refer
>> to the same symbol (or small set of symbols); e.g. within a TU
>> 2. there is a long-ish tail (about 20% of calls to SymbolBody::getVA)
>> which happen at a long temporal distance from any previous call to
>> SymbolBody::getVA on the same symbol. I don't know off the top of my head
>> where these are coming from, but it doesn't sound like relocations. A quick
>> grepping shows a bunch of source locations that match getVA, so it's hard
>> at a glance to see. Any ideas where these other calls are coming from?
>> The particular link I was looking at was a release without debug info
>> link, using `-O0 --no-gc-sections --no-threads`. The particular test case
>> is LLD itself.
>> -- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev