[llvm-dev] [lld] We call SymbolBody::getVA redundantly a lot...

Tue Feb 28 23:15:21 PST 2017

On Tue, Feb 28, 2017 at 12:10 PM, Rui Ueyama <ruiu at google.com> wrote:

> I don't think getVA is particularly expensive, and if it is not expensive
> I wouldn't cache its result. Did you experiment to cache getVA results? I
> think you can do that fairly easily by adding a std::atomic_uint64_t to
> SymbolBody and use it as a cache for getVA.
>

You're right, caching it didn't have any significant effect (though I
wasn't measuring super precisely ). I think I was remembering the profile
wrong. I remember measuring that we had some very bad cache/TLB misses
here, but I guess those aren't too important on the current profile (at
least, not on this test case; the locality of these accesses depends a lot
on the test case).

Also, it seems like our performance is a lot more stable w.r.t.
InputSectionBase::relocate than it used to be (or maybe my current CPU is
just less affected; it's a desktop class processor instead of a xeon).

I took a quick profile of this workload and it looks like it is:

65% in the writer ("backend")
30% in the "frontend" (everything called by SymbolTable::addFile)

The frontend work seems to be largely dominated by ObjectFile::parse (as
you would expect), though there is about 10% of total runtime slipping
through the cracks here in various other "frontend" tasks.

The backend work is split about evenly between scanRelocations and
OutputSection::writeTo. InputSectionBase::relocate is only about 10% of the
total runtime (part of OutputSection::writeTo).

Some slightly cleaned up `perf report` output with some more details:
https://reviews.llvm.org/P7972

So it seems like overall, the profile is basically split 3 ways (about 30%
each):
- frontend (reading input files and building the symbol table and
associated data structures)
- scanRelocations (initial pass over relocations)
- writeTo (mostly IO and InputSectionBase::relocate)

-- Sean Silva

>
> On Tue, Feb 28, 2017 at 4:19 AM, Sean Silva <chisophugis at gmail.com> wrote:
>
>> tl;dr: it looks like we call SymbolBody::getVA about 5x more times than
>> we need to
>>
>> Should we cache it  or something? (careful with threads).
>>
>>
>> Here is a link to a PDF of my Mathematica notebook which has all the
>> details of my investigation:
>> https://drive.google.com/open?id=0B8v10qJ6EXRxVDQ3YnZtUlFtZ1k
>>
>>
>> There seem to be two main regimes that we redundantly call
>> SymbolBody::getVA:
>>
>> 1. most redundant calls on the same symbol (about 80%) happen in quick
>> succession with few intervening calls for other symbols. Most likely we are
>> processing a bunch of relocations right next to each other that all refer
>> to the same symbol (or small set of symbols); e.g. within a TU
>>
>> 2. there is a long-ish tail (about 20% of calls to SymbolBody::getVA)
>> which happen at a long temporal distance from any previous call to
>> SymbolBody::getVA on the same symbol. I don't know off the top of my head
>> where these are coming from, but it doesn't sound like relocations. A quick
>> grepping shows a bunch of source locations that match getVA, so it's hard
>> at a glance to see. Any ideas where these other calls are coming from?
>>
>> The particular link I was looking at was a release without debug info
>> link, using `-O0 --no-gc-sections --no-threads`. The particular test case
>> is LLD itself.
>>
>> -- Sean Silva
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170228/a91e415b/attachment.html>