[lld] r232460 - [ELF] Use parallel_for_each for writing.

Wed Mar 18 10:33:07 PDT 2015

Yes, the result of the link step, that creates lld has debug info.

On 3/18/2015 12:17 PM, Rafael Espíndola wrote:
> And 3.7 seconds is pretty slow. lld is much smaller than clang.
>
> Are you linking lld with debug info?
>
> On 18 March 2015 at 13:02, Shankar Easwaran <shankare at codeaurora.org> wrote:
>> These are the numbers that I got on my machine :-
>>
>> With patch
>> -----------------
>>
>>         3678.663899 task-clock                #    0.995 CPUs utilized
>> ( +-  0.13% )
>>                 198 context-switches          #    0.000 M/sec
>> ( +-  0.91% )
>>                   1 CPU-migrations            #    0.000 M/sec
>> ( +- 58.63% )
>>             461,051 page-faults               #    0.125 M/sec
>> ( +-  0.01% )
>>      13,655,075,694 cycles                    #    3.712 GHz
>> ( +-  0.10% )
>>       7,880,958,266 stalled-cycles-frontend   #   57.71% frontend cycles idle
>> ( +-  0.17% )
>>       5,528,478,678 stalled-cycles-backend    #   40.49% backend cycles idle
>> ( +-  0.20% )
>>      14,231,481,304 instructions              #    1.04  insns per cycle
>>                                               #    0.55  stalled cycles per
>> insn  ( +-  0.01% )
>>       2,855,286,289 branches                  #  776.175 M/sec
>> ( +-  0.01% )
>>          48,367,719 branch-misses             #    1.69% of all branches
>> ( +-  0.04% )
>>
>>         3.697282619 seconds time elapsed
>> ( +-  0.22% )
>>
>> Without patch
>> ----------------------
>>         3678.045942 task-clock                #    0.997 CPUs utilized
>> ( +-  0.13% )
>>                 182 context-switches          #    0.000 M/sec
>>                   1 CPU-migrations            #    0.000 M/sec
>> ( +- 92.22% )
>>             461,009 page-faults               #    0.125 M/sec
>> ( +-  0.00% )
>>      13,636,665,496 cycles                    #    3.708 GHz
>> ( +-  0.08% )
>>       7,872,155,198 stalled-cycles-frontend   #   57.73% frontend cycles idle
>> ( +-  0.15% )
>>       5,520,295,730 stalled-cycles-backend    #   40.48% backend cycles idle
>> ( +-  0.16% )
>>      14,218,218,499 instructions              #    1.04  insns per cycle
>>                                               #    0.55  stalled cycles per
>> insn  ( +-  0.00% )
>>       2,851,381,196 branches                  #  775.243 M/sec
>> ( +-  0.00% )
>>          48,362,236 branch-misses             #    1.70% of all branches
>> ( +-  0.01% )
>>
>>         3.688849872 seconds time elapsed
>> ( +-  0.13% )
>>
>> This was with self hosting lld. Looks like the patch is not really improving
>> from the numbers seen using perf.
>>
>> Shankar Easwaran
>>
>> On 3/18/2015 11:32 AM, Rafael Espíndola wrote:
>>> As with anything threading related, it might also be worth
>>> benchmarking it on Windows.
>>>
>>> On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org>
>>> wrote:
>>>> It looks like these are the right numbers and Strange, I dont see a huge
>>>> advantage of the patch trying to parallelize writing output sections in
>>>> parallel.
>>>>
>>>>
>>>> On 3/18/2015 11:23 AM, Rafael Espíndola wrote:
>>>>> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
>>>>> wrote:
>>>>>> Does this repeat with the same numbers across similar tries ?
>>>>> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
>>>>> sanity check I got
>>>>>
>>>>>
>>>>> master:
>>>>>           1850.315854      task-clock (msec)         #    0.999 CPUs
>>>>> utilized            ( +-  0.20% )
>>>>>                 1,246      context-switches          #    0.673 K/sec
>>>>>                     0      cpu-migrations            #    0.000 K/sec
>>>>>                   ( +-100.00% )
>>>>>               191,223      page-faults               #    0.103 M/sec
>>>>>                   ( +-  0.00% )
>>>>>         5,570,279,746      cycles                    #    3.010 GHz
>>>>>                   ( +-  0.08% )
>>>>>         3,076,652,220      stalled-cycles-frontend   #   55.23% frontend
>>>>> cycles idle     ( +-  0.15% )
>>>>>       <not supported>      stalled-cycles-backend
>>>>>         6,061,467,442      instructions              #    1.09  insns per
>>>>> cycle
>>>>>                                                      #    0.51  stalled
>>>>> cycles per insn  ( +-  0.00% )
>>>>>         1,262,014,047      branches                  #  682.053 M/sec
>>>>>                   ( +-  0.00% )
>>>>>            26,526,169      branch-misses             #    2.10% of all
>>>>> branches          ( +-  0.00% )
>>>>>
>>>>>           1.852094924 seconds time elapsed
>>>>>              ( +-  0.20% )
>>>>>
>>>>> master minus your patch:
>>>>>
>>>>>           1837.986418      task-clock (msec)         #    0.999 CPUs
>>>>> utilized            ( +-  0.01% )
>>>>>                 1,170      context-switches          #    0.637 K/sec
>>>>>                     0      cpu-migrations            #    0.000 K/sec
>>>>>               191,225      page-faults               #    0.104 M/sec
>>>>>                   ( +-  0.00% )
>>>>>         5,517,484,340      cycles                    #    3.002 GHz
>>>>>                   ( +-  0.01% )
>>>>>         3,036,583,530      stalled-cycles-frontend   #   55.04% frontend
>>>>> cycles idle     ( +-  0.02% )
>>>>>       <not supported>      stalled-cycles-backend
>>>>>         6,004,436,870      instructions              #    1.09  insns per
>>>>> cycle
>>>>>                                                      #    0.51  stalled
>>>>> cycles per insn  ( +-  0.00% )
>>>>>         1,250,685,716      branches                  #  680.465 M/sec
>>>>>                   ( +-  0.00% )
>>>>>            26,539,486      branch-misses             #    2.12% of all
>>>>> branches          ( +-  0.00% )
>>>>>
>>>>>           1.839759787 seconds time elapsed
>>>>>              ( +-  0.01% )
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Rafael
>>>>>
>>>> --
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>>>> by
>>>> the Linux Foundation
>>>>
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
>> the Linux Foundation
>>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation