[lld] r232460 - [ELF] Use parallel_for_each for writing.
Shankar Easwaran
shankare at codeaurora.org
Wed Mar 18 10:33:07 PDT 2015
Yes, the result of the link step, that creates lld has debug info.
On 3/18/2015 12:17 PM, Rafael Espíndola wrote:
> And 3.7 seconds is pretty slow. lld is much smaller than clang.
>
> Are you linking lld with debug info?
>
> On 18 March 2015 at 13:02, Shankar Easwaran <shankare at codeaurora.org> wrote:
>> These are the numbers that I got on my machine :-
>>
>> With patch
>> -----------------
>>
>> 3678.663899 task-clock # 0.995 CPUs utilized
>> ( +- 0.13% )
>> 198 context-switches # 0.000 M/sec
>> ( +- 0.91% )
>> 1 CPU-migrations # 0.000 M/sec
>> ( +- 58.63% )
>> 461,051 page-faults # 0.125 M/sec
>> ( +- 0.01% )
>> 13,655,075,694 cycles # 3.712 GHz
>> ( +- 0.10% )
>> 7,880,958,266 stalled-cycles-frontend # 57.71% frontend cycles idle
>> ( +- 0.17% )
>> 5,528,478,678 stalled-cycles-backend # 40.49% backend cycles idle
>> ( +- 0.20% )
>> 14,231,481,304 instructions # 1.04 insns per cycle
>> # 0.55 stalled cycles per
>> insn ( +- 0.01% )
>> 2,855,286,289 branches # 776.175 M/sec
>> ( +- 0.01% )
>> 48,367,719 branch-misses # 1.69% of all branches
>> ( +- 0.04% )
>>
>> 3.697282619 seconds time elapsed
>> ( +- 0.22% )
>>
>> Without patch
>> ----------------------
>> 3678.045942 task-clock # 0.997 CPUs utilized
>> ( +- 0.13% )
>> 182 context-switches # 0.000 M/sec
>> 1 CPU-migrations # 0.000 M/sec
>> ( +- 92.22% )
>> 461,009 page-faults # 0.125 M/sec
>> ( +- 0.00% )
>> 13,636,665,496 cycles # 3.708 GHz
>> ( +- 0.08% )
>> 7,872,155,198 stalled-cycles-frontend # 57.73% frontend cycles idle
>> ( +- 0.15% )
>> 5,520,295,730 stalled-cycles-backend # 40.48% backend cycles idle
>> ( +- 0.16% )
>> 14,218,218,499 instructions # 1.04 insns per cycle
>> # 0.55 stalled cycles per
>> insn ( +- 0.00% )
>> 2,851,381,196 branches # 775.243 M/sec
>> ( +- 0.00% )
>> 48,362,236 branch-misses # 1.70% of all branches
>> ( +- 0.01% )
>>
>> 3.688849872 seconds time elapsed
>> ( +- 0.13% )
>>
>> This was with self hosting lld. Looks like the patch is not really improving
>> from the numbers seen using perf.
>>
>> Shankar Easwaran
>>
>> On 3/18/2015 11:32 AM, Rafael Espíndola wrote:
>>> As with anything threading related, it might also be worth
>>> benchmarking it on Windows.
>>>
>>> On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org>
>>> wrote:
>>>> It looks like these are the right numbers and Strange, I dont see a huge
>>>> advantage of the patch trying to parallelize writing output sections in
>>>> parallel.
>>>>
>>>>
>>>> On 3/18/2015 11:23 AM, Rafael Espíndola wrote:
>>>>> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
>>>>> wrote:
>>>>>> Does this repeat with the same numbers across similar tries ?
>>>>> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
>>>>> sanity check I got
>>>>>
>>>>>
>>>>> master:
>>>>> 1850.315854 task-clock (msec) # 0.999 CPUs
>>>>> utilized ( +- 0.20% )
>>>>> 1,246 context-switches # 0.673 K/sec
>>>>> 0 cpu-migrations # 0.000 K/sec
>>>>> ( +-100.00% )
>>>>> 191,223 page-faults # 0.103 M/sec
>>>>> ( +- 0.00% )
>>>>> 5,570,279,746 cycles # 3.010 GHz
>>>>> ( +- 0.08% )
>>>>> 3,076,652,220 stalled-cycles-frontend # 55.23% frontend
>>>>> cycles idle ( +- 0.15% )
>>>>> <not supported> stalled-cycles-backend
>>>>> 6,061,467,442 instructions # 1.09 insns per
>>>>> cycle
>>>>> # 0.51 stalled
>>>>> cycles per insn ( +- 0.00% )
>>>>> 1,262,014,047 branches # 682.053 M/sec
>>>>> ( +- 0.00% )
>>>>> 26,526,169 branch-misses # 2.10% of all
>>>>> branches ( +- 0.00% )
>>>>>
>>>>> 1.852094924 seconds time elapsed
>>>>> ( +- 0.20% )
>>>>>
>>>>> master minus your patch:
>>>>>
>>>>> 1837.986418 task-clock (msec) # 0.999 CPUs
>>>>> utilized ( +- 0.01% )
>>>>> 1,170 context-switches # 0.637 K/sec
>>>>> 0 cpu-migrations # 0.000 K/sec
>>>>> 191,225 page-faults # 0.104 M/sec
>>>>> ( +- 0.00% )
>>>>> 5,517,484,340 cycles # 3.002 GHz
>>>>> ( +- 0.01% )
>>>>> 3,036,583,530 stalled-cycles-frontend # 55.04% frontend
>>>>> cycles idle ( +- 0.02% )
>>>>> <not supported> stalled-cycles-backend
>>>>> 6,004,436,870 instructions # 1.09 insns per
>>>>> cycle
>>>>> # 0.51 stalled
>>>>> cycles per insn ( +- 0.00% )
>>>>> 1,250,685,716 branches # 680.465 M/sec
>>>>> ( +- 0.00% )
>>>>> 26,539,486 branch-misses # 2.12% of all
>>>>> branches ( +- 0.00% )
>>>>>
>>>>> 1.839759787 seconds time elapsed
>>>>> ( +- 0.01% )
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Rafael
>>>>>
>>>> --
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>>>> by
>>>> the Linux Foundation
>>>>
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
>> the Linux Foundation
>>
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
More information about the llvm-commits
mailing list