[lld] r232460 - [ELF] Use parallel_for_each for writing.

Wed Mar 18 10:02:57 PDT 2015

These are the numbers that I got on my machine :-

With patch
-----------------

        3678.663899 task-clock                #    0.995 CPUs 
utilized            ( +-  0.13% )
                198 context-switches          #    0.000 
M/sec                    ( +-  0.91% )
                  1 CPU-migrations            #    0.000 
M/sec                    ( +- 58.63% )
            461,051 page-faults               #    0.125 
M/sec                    ( +-  0.01% )
     13,655,075,694 cycles                    #    3.712 
GHz                      ( +-  0.10% )
      7,880,958,266 stalled-cycles-frontend   #   57.71% frontend cycles 
idle     ( +-  0.17% )
      5,528,478,678 stalled-cycles-backend    #   40.49% backend cycles 
idle     ( +-  0.20% )
     14,231,481,304 instructions              #    1.04  insns per cycle
                                              #    0.55  stalled cycles 
per insn  ( +-  0.01% )
      2,855,286,289 branches                  #  776.175 
M/sec                    ( +-  0.01% )
         48,367,719 branch-misses             #    1.69% of all 
branches          ( +-  0.04% )

        3.697282619 seconds time 
elapsed                                          ( +-  0.22% )

Without patch
----------------------
        3678.045942 task-clock                #    0.997 CPUs 
utilized            ( +-  0.13% )
                182 context-switches          #    0.000 M/sec
                  1 CPU-migrations            #    0.000 
M/sec                    ( +- 92.22% )
            461,009 page-faults               #    0.125 
M/sec                    ( +-  0.00% )
     13,636,665,496 cycles                    #    3.708 
GHz                      ( +-  0.08% )
      7,872,155,198 stalled-cycles-frontend   #   57.73% frontend cycles 
idle     ( +-  0.15% )
      5,520,295,730 stalled-cycles-backend    #   40.48% backend cycles 
idle     ( +-  0.16% )
     14,218,218,499 instructions              #    1.04  insns per cycle
                                              #    0.55  stalled cycles 
per insn  ( +-  0.00% )
      2,851,381,196 branches                  #  775.243 
M/sec                    ( +-  0.00% )
         48,362,236 branch-misses             #    1.70% of all 
branches          ( +-  0.01% )

        3.688849872 seconds time 
elapsed                                          ( +-  0.13% )

This was with self hosting lld. Looks like the patch is not really 
improving from the numbers seen using perf.

Shankar Easwaran

On 3/18/2015 11:32 AM, Rafael Espíndola wrote:
> As with anything threading related, it might also be worth
> benchmarking it on Windows.
>
> On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org> wrote:
>> It looks like these are the right numbers and Strange, I dont see a huge
>> advantage of the patch trying to parallelize writing output sections in
>> parallel.
>>
>>
>> On 3/18/2015 11:23 AM, Rafael Espíndola wrote:
>>> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
>>> wrote:
>>>> Does this repeat with the same numbers across similar tries ?
>>> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
>>> sanity check I got
>>>
>>>
>>> master:
>>>          1850.315854      task-clock (msec)         #    0.999 CPUs
>>> utilized            ( +-  0.20% )
>>>                1,246      context-switches          #    0.673 K/sec
>>>                    0      cpu-migrations            #    0.000 K/sec
>>>                  ( +-100.00% )
>>>              191,223      page-faults               #    0.103 M/sec
>>>                  ( +-  0.00% )
>>>        5,570,279,746      cycles                    #    3.010 GHz
>>>                  ( +-  0.08% )
>>>        3,076,652,220      stalled-cycles-frontend   #   55.23% frontend
>>> cycles idle     ( +-  0.15% )
>>>      <not supported>      stalled-cycles-backend
>>>        6,061,467,442      instructions              #    1.09  insns per
>>> cycle
>>>                                                     #    0.51  stalled
>>> cycles per insn  ( +-  0.00% )
>>>        1,262,014,047      branches                  #  682.053 M/sec
>>>                  ( +-  0.00% )
>>>           26,526,169      branch-misses             #    2.10% of all
>>> branches          ( +-  0.00% )
>>>
>>>          1.852094924 seconds time elapsed
>>>             ( +-  0.20% )
>>>
>>> master minus your patch:
>>>
>>>          1837.986418      task-clock (msec)         #    0.999 CPUs
>>> utilized            ( +-  0.01% )
>>>                1,170      context-switches          #    0.637 K/sec
>>>                    0      cpu-migrations            #    0.000 K/sec
>>>              191,225      page-faults               #    0.104 M/sec
>>>                  ( +-  0.00% )
>>>        5,517,484,340      cycles                    #    3.002 GHz
>>>                  ( +-  0.01% )
>>>        3,036,583,530      stalled-cycles-frontend   #   55.04% frontend
>>> cycles idle     ( +-  0.02% )
>>>      <not supported>      stalled-cycles-backend
>>>        6,004,436,870      instructions              #    1.09  insns per
>>> cycle
>>>                                                     #    0.51  stalled
>>> cycles per insn  ( +-  0.00% )
>>>        1,250,685,716      branches                  #  680.465 M/sec
>>>                  ( +-  0.00% )
>>>           26,539,486      branch-misses             #    2.12% of all
>>> branches          ( +-  0.00% )
>>>
>>>          1.839759787 seconds time elapsed
>>>             ( +-  0.01% )
>>>
>>>
>>> Cheers,
>>> Rafael
>>>
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
>> the Linux Foundation
>>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation