[lld] r232460 - [ELF] Use parallel_for_each for writing.
Shankar Easwaran
shankare at codeaurora.org
Thu Mar 19 10:11:45 PDT 2015
Hi Rafael,
Thanks for the info.
Any idea how many context switches that gold ends up with ?
Shankar Easwaran
On 3/19/2015 11:07 AM, Rafael EspĂndola wrote:
> On 18 March 2015 at 20:26, Sean Silva <silvas at purdue.edu> wrote:
>>
>> On Wed, Mar 18, 2015 at 9:38 AM, Rui Ueyama <ruiu at google.com> wrote:
>>> It's not strange. Making something parallel doesn't always make it run
>>> faster. Oftentimes it makes thing even slower. That's the whole point why I
>>> emphasized the importance of accurate benchmark. (Note that this is a result
>>> of linking Clang. You might see different results depending on programs.)
>>
>> Actually I'm wondering if we're doing *anything* in parallel, since perf is
>> reporting "0.999 CPUs utilized".
> Gah!
>
> I was wondering about that too and today it hit me: The "-a 0x4" in
> the schedtool invocation was constraining the process to a particular
> core. A leftover from benchmarking too many single threaded things :-(
>
> I ran the link again (in a fresh build, sorry). What I got was
>
> lld:
> 2544.422504 task-clock (msec) # 1.837 CPUs utilized
> ( +- 0.13%
> 1.385020712 seconds time elapsed
> ( +- 0.15% )
>
> lld-revert
> 2465.438485 task-clock (msec) # 1.655 CPUs utilized
> ( +- 0.30% )
> 1.489689761 seconds time elapsed
> ( +- 0.31% )
>
> gold:
> 918.859273 task-clock (msec) # 0.999 CPUs utilized
> ( +- 0.01% )
> 0.919717669 seconds time elapsed
> ( +- 0.01% )
>
> gold --threads
> 1300.210314 task-clock (msec) # 1.523 CPUs utilized
> ( +- 0.15% )
> 0.853835099 seconds time elapsed
> ( +- 0.25% )
>
>
> So it looks like Shankar's patch does help.
>
> Really sorry about the noise. I will reply to the main "lld
> performance" thread after lunch.
>
> Cheers,
> Rafael
>
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
More information about the llvm-commits
mailing list