[LLVMdev] On LLD performance

Davide Italiano davide at freebsd.org
Mon Mar 16 23:39:06 PDT 2015

On Tue, Mar 17, 2015 at 7:17 AM, Sean Silva <chisophugis at gmail.com> wrote:
> On Mon, Mar 16, 2015 at 10:52 PM, Davide Italiano <davide at freebsd.org>
> wrote:
>> On Mon, Mar 16, 2015 at 1:54 AM, Davide Italiano <davide at freebsd.org>
>> wrote:
>> >
>> > Shankar's parallel for per-se didn't introduce any performance benefit
>> > (or regression).
>> > If the change I propose is safe, I would like to see Shankar's change
>> > in (and this on top of it).
>> > I have other related changes coming next, but I would like to tackle
>> > them one at a time.
>> >
>> Here's an update.
>> After http://reviews.llvm.org/D8372 , I updated the profiling data.
>> https://people.freebsd.org/~davide/llvm/lld-03162015.svg
>> It seems now 85% of CPU time is spent inside
>> FileArchive::buildTableOfContents().
> I'm rather amazed that that patch changed the total CPU time. Just doing the
> work in parallel shouldn't reduce the total CPU time spent on the task. A
> reduction in CPU time would happen though if parallelizing it increased the
> single-threaded performance of the tasks being done in parallel. Perhaps
> using multiple cores means we are using multiple caches, so each thread is
> getting much better single-threaded performance due to reduced memory
> bottlenecking?
> -- Sean Silva
>> In particular, 35% of the samples are spent inserting into
>> unordered_map, so there's maybe something we can do differently there
>> (e.g. , Rui's proposal of a concurrent map doesn't seem that bad).
>> Thanks,
>> --
>> Davide
>> "There are no solved problems; there are only problems that are more
>> or less solved" -- Henri Poincare
David, Thanks for the input. I'll try DenseMap tomorrow and report results.
Sean, I personally was amazed by that too. I cannot exclude some
errors in the sampling for hwpmc, I'll try to repeat the profiling
and/or use another profiler to see if I can confirm the results.
About your other answer, I guess that would require a more
fine-grained analysis which includes memory bandwidth, cache misses
etc.. I'll try to get to it later this week or in the weekend. For
now, I'm just focusing on CPU profiling.



