[LLVMdev] On LLD performance

Mon Mar 16 23:39:06 PDT 2015

On Tue, Mar 17, 2015 at 7:17 AM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
> On Mon, Mar 16, 2015 at 10:52 PM, Davide Italiano <davide at freebsd.org>
> wrote:
>>
>> On Mon, Mar 16, 2015 at 1:54 AM, Davide Italiano <davide at freebsd.org>
>> wrote:
>> >
>> > Shankar's parallel for per-se didn't introduce any performance benefit
>> > (or regression).
>> > If the change I propose is safe, I would like to see Shankar's change
>> > in (and this on top of it).
>> > I have other related changes coming next, but I would like to tackle
>> > them one at a time.
>> >
>>
>> Here's an update.
>>
>> After http://reviews.llvm.org/D8372 , I updated the profiling data.
>>
>> https://people.freebsd.org/~davide/llvm/lld-03162015.svg
>> It seems now 85% of CPU time is spent inside
>> FileArchive::buildTableOfContents().
>
>
> I'm rather amazed that that patch changed the total CPU time. Just doing the
> work in parallel shouldn't reduce the total CPU time spent on the task. A
> reduction in CPU time would happen though if parallelizing it increased the
> single-threaded performance of the tasks being done in parallel. Perhaps
> using multiple cores means we are using multiple caches, so each thread is
> getting much better single-threaded performance due to reduced memory
> bottlenecking?
>
> -- Sean Silva
>
>>
>> In particular, 35% of the samples are spent inserting into
>> unordered_map, so there's maybe something we can do differently there
>> (e.g. , Rui's proposal of a concurrent map doesn't seem that bad).
>>
>> Thanks,
>>
>> --
>> Davide
>>
>> "There are no solved problems; there are only problems that are more
>> or less solved" -- Henri Poincare
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

David, Thanks for the input. I'll try DenseMap tomorrow and report results.
Sean, I personally was amazed by that too. I cannot exclude some
errors in the sampling for hwpmc, I'll try to repeat the profiling
and/or use another profiler to see if I can confirm the results.
About your other answer, I guess that would require a more
fine-grained analysis which includes memory bandwidth, cache misses
etc.. I'll try to get to it later this week or in the weekend. For
now, I'm just focusing on CPU profiling.

Thanks,

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare