[lld] r232460 - [ELF] Use parallel_for_each for writing.

Wed Mar 18 09:32:48 PDT 2015

As with anything threading related, it might also be worth
benchmarking it on Windows.

On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org> wrote:
> It looks like these are the right numbers and Strange, I dont see a huge
> advantage of the patch trying to parallelize writing output sections in
> parallel.
>
>
> On 3/18/2015 11:23 AM, Rafael Espíndola wrote:
>>
>> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
>> wrote:
>>>
>>> Does this repeat with the same numbers across similar tries ?
>>
>> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
>> sanity check I got
>>
>>
>> master:
>>         1850.315854      task-clock (msec)         #    0.999 CPUs
>> utilized            ( +-  0.20% )
>>               1,246      context-switches          #    0.673 K/sec
>>                   0      cpu-migrations            #    0.000 K/sec
>>                 ( +-100.00% )
>>             191,223      page-faults               #    0.103 M/sec
>>                 ( +-  0.00% )
>>       5,570,279,746      cycles                    #    3.010 GHz
>>                 ( +-  0.08% )
>>       3,076,652,220      stalled-cycles-frontend   #   55.23% frontend
>> cycles idle     ( +-  0.15% )
>>     <not supported>      stalled-cycles-backend
>>       6,061,467,442      instructions              #    1.09  insns per
>> cycle
>>                                                    #    0.51  stalled
>> cycles per insn  ( +-  0.00% )
>>       1,262,014,047      branches                  #  682.053 M/sec
>>                 ( +-  0.00% )
>>          26,526,169      branch-misses             #    2.10% of all
>> branches          ( +-  0.00% )
>>
>>         1.852094924 seconds time elapsed
>>            ( +-  0.20% )
>>
>> master minus your patch:
>>
>>         1837.986418      task-clock (msec)         #    0.999 CPUs
>> utilized            ( +-  0.01% )
>>               1,170      context-switches          #    0.637 K/sec
>>                   0      cpu-migrations            #    0.000 K/sec
>>             191,225      page-faults               #    0.104 M/sec
>>                 ( +-  0.00% )
>>       5,517,484,340      cycles                    #    3.002 GHz
>>                 ( +-  0.01% )
>>       3,036,583,530      stalled-cycles-frontend   #   55.04% frontend
>> cycles idle     ( +-  0.02% )
>>     <not supported>      stalled-cycles-backend
>>       6,004,436,870      instructions              #    1.09  insns per
>> cycle
>>                                                    #    0.51  stalled
>> cycles per insn  ( +-  0.00% )
>>       1,250,685,716      branches                  #  680.465 M/sec
>>                 ( +-  0.00% )
>>          26,539,486      branch-misses             #    2.12% of all
>> branches          ( +-  0.00% )
>>
>>         1.839759787 seconds time elapsed
>>            ( +-  0.01% )
>>
>>
>> Cheers,
>> Rafael
>>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> the Linux Foundation
>