[lld] r232460 - [ELF] Use parallel_for_each for writing.

Thu Mar 19 09:07:01 PDT 2015

On 18 March 2015 at 20:26, Sean Silva <silvas at purdue.edu> wrote:
>
>
> On Wed, Mar 18, 2015 at 9:38 AM, Rui Ueyama <ruiu at google.com> wrote:
>>
>> It's not strange. Making something parallel doesn't always make it run
>> faster. Oftentimes it makes thing even slower. That's the whole point why I
>> emphasized the importance of accurate benchmark. (Note that this is a result
>> of linking Clang. You might see different results depending on programs.)
>
>
> Actually I'm wondering if we're doing *anything* in parallel, since perf is
> reporting "0.999 CPUs utilized".

Gah!

I was wondering about that too and today it hit me: The "-a 0x4" in
the schedtool invocation was constraining the process to a particular
core. A leftover from benchmarking too many single threaded things :-(

I ran the link again (in a fresh build, sorry). What I got was

lld:
2544.422504      task-clock (msec)         #    1.837 CPUs utilized
        ( +-  0.13%
 1.385020712 seconds time elapsed
    ( +-  0.15% )

lld-revert
 2465.438485      task-clock (msec)         #    1.655 CPUs utilized
         ( +-  0.30% )
1.489689761 seconds time elapsed
   ( +-  0.31% )

gold:
918.859273      task-clock (msec)         #    0.999 CPUs utilized
       ( +-  0.01% )
0.919717669 seconds time elapsed
   ( +-  0.01% )

gold --threads
 1300.210314      task-clock (msec)         #    1.523 CPUs utilized
         ( +-  0.15% )
0.853835099 seconds time elapsed
   ( +-  0.25% )

So it looks like Shankar's patch does help.

Really sorry about the noise. I will reply to the main "lld
performance" thread after lunch.

Cheers,
Rafael