[lld] r232460 - [ELF] Use parallel_for_each for writing.
Rafael Espíndola
rafael.espindola at gmail.com
Wed Mar 18 10:17:38 PDT 2015
And 3.7 seconds is pretty slow. lld is much smaller than clang.
Are you linking lld with debug info?
On 18 March 2015 at 13:02, Shankar Easwaran <shankare at codeaurora.org> wrote:
> These are the numbers that I got on my machine :-
>
> With patch
> -----------------
>
> 3678.663899 task-clock # 0.995 CPUs utilized
> ( +- 0.13% )
> 198 context-switches # 0.000 M/sec
> ( +- 0.91% )
> 1 CPU-migrations # 0.000 M/sec
> ( +- 58.63% )
> 461,051 page-faults # 0.125 M/sec
> ( +- 0.01% )
> 13,655,075,694 cycles # 3.712 GHz
> ( +- 0.10% )
> 7,880,958,266 stalled-cycles-frontend # 57.71% frontend cycles idle
> ( +- 0.17% )
> 5,528,478,678 stalled-cycles-backend # 40.49% backend cycles idle
> ( +- 0.20% )
> 14,231,481,304 instructions # 1.04 insns per cycle
> # 0.55 stalled cycles per
> insn ( +- 0.01% )
> 2,855,286,289 branches # 776.175 M/sec
> ( +- 0.01% )
> 48,367,719 branch-misses # 1.69% of all branches
> ( +- 0.04% )
>
> 3.697282619 seconds time elapsed
> ( +- 0.22% )
>
> Without patch
> ----------------------
> 3678.045942 task-clock # 0.997 CPUs utilized
> ( +- 0.13% )
> 182 context-switches # 0.000 M/sec
> 1 CPU-migrations # 0.000 M/sec
> ( +- 92.22% )
> 461,009 page-faults # 0.125 M/sec
> ( +- 0.00% )
> 13,636,665,496 cycles # 3.708 GHz
> ( +- 0.08% )
> 7,872,155,198 stalled-cycles-frontend # 57.73% frontend cycles idle
> ( +- 0.15% )
> 5,520,295,730 stalled-cycles-backend # 40.48% backend cycles idle
> ( +- 0.16% )
> 14,218,218,499 instructions # 1.04 insns per cycle
> # 0.55 stalled cycles per
> insn ( +- 0.00% )
> 2,851,381,196 branches # 775.243 M/sec
> ( +- 0.00% )
> 48,362,236 branch-misses # 1.70% of all branches
> ( +- 0.01% )
>
> 3.688849872 seconds time elapsed
> ( +- 0.13% )
>
> This was with self hosting lld. Looks like the patch is not really improving
> from the numbers seen using perf.
>
> Shankar Easwaran
>
> On 3/18/2015 11:32 AM, Rafael Espíndola wrote:
>>
>> As with anything threading related, it might also be worth
>> benchmarking it on Windows.
>>
>> On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org>
>> wrote:
>>>
>>> It looks like these are the right numbers and Strange, I dont see a huge
>>> advantage of the patch trying to parallelize writing output sections in
>>> parallel.
>>>
>>>
>>> On 3/18/2015 11:23 AM, Rafael Espíndola wrote:
>>>>
>>>> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
>>>> wrote:
>>>>>
>>>>> Does this repeat with the same numbers across similar tries ?
>>>>
>>>> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
>>>> sanity check I got
>>>>
>>>>
>>>> master:
>>>> 1850.315854 task-clock (msec) # 0.999 CPUs
>>>> utilized ( +- 0.20% )
>>>> 1,246 context-switches # 0.673 K/sec
>>>> 0 cpu-migrations # 0.000 K/sec
>>>> ( +-100.00% )
>>>> 191,223 page-faults # 0.103 M/sec
>>>> ( +- 0.00% )
>>>> 5,570,279,746 cycles # 3.010 GHz
>>>> ( +- 0.08% )
>>>> 3,076,652,220 stalled-cycles-frontend # 55.23% frontend
>>>> cycles idle ( +- 0.15% )
>>>> <not supported> stalled-cycles-backend
>>>> 6,061,467,442 instructions # 1.09 insns per
>>>> cycle
>>>> # 0.51 stalled
>>>> cycles per insn ( +- 0.00% )
>>>> 1,262,014,047 branches # 682.053 M/sec
>>>> ( +- 0.00% )
>>>> 26,526,169 branch-misses # 2.10% of all
>>>> branches ( +- 0.00% )
>>>>
>>>> 1.852094924 seconds time elapsed
>>>> ( +- 0.20% )
>>>>
>>>> master minus your patch:
>>>>
>>>> 1837.986418 task-clock (msec) # 0.999 CPUs
>>>> utilized ( +- 0.01% )
>>>> 1,170 context-switches # 0.637 K/sec
>>>> 0 cpu-migrations # 0.000 K/sec
>>>> 191,225 page-faults # 0.104 M/sec
>>>> ( +- 0.00% )
>>>> 5,517,484,340 cycles # 3.002 GHz
>>>> ( +- 0.01% )
>>>> 3,036,583,530 stalled-cycles-frontend # 55.04% frontend
>>>> cycles idle ( +- 0.02% )
>>>> <not supported> stalled-cycles-backend
>>>> 6,004,436,870 instructions # 1.09 insns per
>>>> cycle
>>>> # 0.51 stalled
>>>> cycles per insn ( +- 0.00% )
>>>> 1,250,685,716 branches # 680.465 M/sec
>>>> ( +- 0.00% )
>>>> 26,539,486 branch-misses # 2.12% of all
>>>> branches ( +- 0.00% )
>>>>
>>>> 1.839759787 seconds time elapsed
>>>> ( +- 0.01% )
>>>>
>>>>
>>>> Cheers,
>>>> Rafael
>>>>
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>>> by
>>> the Linux Foundation
>>>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> the Linux Foundation
>
More information about the llvm-commits
mailing list