[LLVMdev] On LLD performance

Sun Mar 15 17:54:04 PDT 2015

On Sun, Mar 15, 2015 at 12:36 AM, Davide Italiano <davide at freebsd.org> wrote:
> On Sat, Mar 14, 2015 at 2:52 AM, Nick Kledzik <kledzik at apple.com> wrote:
>> Rui,
>>
>> Thanks for this write up and the work you are doing to improve performance.
>> A  couple of thoughts:
>>
>> * In the old days, linking was I/O bound so the linker run tended to be the
>> same speed as cat’ing all the input files together.  Now that spinning disks
>> are less common, I/O may no longer be the bottleneck.
>>
>> * How does the link time compare to the Windows system linker?
>>
>> * Where exactly in the Resolving phase is the bottleneck?  Is it the symbol
>> table or the other work?  (Is much time spent growing (and rehashing) the
>> DenseMap?  If so, would setting a larger initial bucket size help?)
>>
>> * Regarding the Rafeal’s performance comparison with gold (where lld is
>> about half the speed).  The old binutil’s ld was a cross platform linker
>> that supported multiple formats.  The goal of gold was to have better
>> performance by only supporting ELF.  So I would expect gold to be faster
>> than the general purpose lld.
>>
>>
>> -Nick
>>
>
> After my wild and wrong guess from a couple of days ago, I realized it
> was better spending some time analyzing code and profiling, rather
> than speculating.
> The machine where I tried to link clang using lld is an 8-core Xeon,
> but I hardly was able to use more than two cores.
> I ran my tests on UFS + 7200rpm disk ( I can provide details, if
> needed). The OS is FreeBSD-11 from a month ago or such.
>
> Flamegraph results (now zoomable):
> https://people.freebsd.org/~davide/llvm/lld.svg
>
> Nick, to answer your question DenseMap ops are a generally an hotspot
> for this kind of workload, although grow() operations aren't
> particularly a big problem (at least in this case). I see them showing
> up only for addReferencetoSymbol() and they account for large part of
> the overall samples. In order to convince myself of my theory I
> started tinkering and modified the initial size of the map but I
> wasn't able to notice any non-negligible improvement.
>
> What instead seems to be a bigger problem is symbolTable::replacement,
> DenseMap operations are about 10% of the overall samples. It seems
> that in order to find the replacement for a given atom we need to
> traverse a chain of atoms until the last one is found. This results in
> O(n) calls to find() where n is the length of the chain. I will try to
> investigate and see if we can make something differently.
>
> Please let me know if there's something else that shows up from the
> graph, and/or you have any idea on how to improve. Sorry for the
> output of symbols being a little bit verbose, the collector wasn't
> written with C++ in mind.
>
> --
> Davide

The following patch:

Index: lib/Core/Resolver.cpp
===================================================================

--- lib/Core/Resolver.cpp       (revision 232330)
+++ lib/Core/Resolver.cpp       (working copy)
@@ -341,7 +341,8 @@
 // to the new defined atom
 void Resolver::updateReferences() {
   ScopedTask task(getDefaultDomain(), "updateReferences");
-  for (const Atom *atom : _atoms) {
+  parallel_for_each(_atoms.begin(), _atoms.end(),
+      [&](const Atom *atom) {
     if (const DefinedAtom *defAtom = dyn_cast<DefinedAtom>(atom)) {
       for (const Reference *ref : *defAtom) {
         // A reference of type kindAssociate should't be updated.
@@ -358,7 +359,7 @@
         const_cast<Reference *>(ref)->setTarget(newTarget);
       }
     }
-  }
+  });
 }

built on top of Shankar's parallel for shaves ~ 20 seconds from
linking clang on my machine, making the overall linking 20% faster.

Results (unpatched):
https://people.freebsd.org/~davide/llvm/lld_nonparallel_updaterel.txt

Results (patched):
https://people.freebsd.org/~davide/llvm/lld_parallel_updaterel.txt

Shankar's parallel for per-se didn't introduce any performance benefit
(or regression).
If the change I propose is safe, I would like to see Shankar's change
in (and this on top of it).
I have other related changes coming next, but I would like to tackle
them one at a time.

Thanks,

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare