[lld] r232460 - [ELF] Use parallel_for_each for writing.

Rafael Espíndola rafael.espindola at gmail.com
Wed Mar 18 10:17:38 PDT 2015


And 3.7 seconds is pretty slow. lld is much smaller than clang.

Are you linking lld with debug info?

On 18 March 2015 at 13:02, Shankar Easwaran <shankare at codeaurora.org> wrote:
> These are the numbers that I got on my machine :-
>
> With patch
> -----------------
>
>        3678.663899 task-clock                #    0.995 CPUs utilized
> ( +-  0.13% )
>                198 context-switches          #    0.000 M/sec
> ( +-  0.91% )
>                  1 CPU-migrations            #    0.000 M/sec
> ( +- 58.63% )
>            461,051 page-faults               #    0.125 M/sec
> ( +-  0.01% )
>     13,655,075,694 cycles                    #    3.712 GHz
> ( +-  0.10% )
>      7,880,958,266 stalled-cycles-frontend   #   57.71% frontend cycles idle
> ( +-  0.17% )
>      5,528,478,678 stalled-cycles-backend    #   40.49% backend cycles idle
> ( +-  0.20% )
>     14,231,481,304 instructions              #    1.04  insns per cycle
>                                              #    0.55  stalled cycles per
> insn  ( +-  0.01% )
>      2,855,286,289 branches                  #  776.175 M/sec
> ( +-  0.01% )
>         48,367,719 branch-misses             #    1.69% of all branches
> ( +-  0.04% )
>
>        3.697282619 seconds time elapsed
> ( +-  0.22% )
>
> Without patch
> ----------------------
>        3678.045942 task-clock                #    0.997 CPUs utilized
> ( +-  0.13% )
>                182 context-switches          #    0.000 M/sec
>                  1 CPU-migrations            #    0.000 M/sec
> ( +- 92.22% )
>            461,009 page-faults               #    0.125 M/sec
> ( +-  0.00% )
>     13,636,665,496 cycles                    #    3.708 GHz
> ( +-  0.08% )
>      7,872,155,198 stalled-cycles-frontend   #   57.73% frontend cycles idle
> ( +-  0.15% )
>      5,520,295,730 stalled-cycles-backend    #   40.48% backend cycles idle
> ( +-  0.16% )
>     14,218,218,499 instructions              #    1.04  insns per cycle
>                                              #    0.55  stalled cycles per
> insn  ( +-  0.00% )
>      2,851,381,196 branches                  #  775.243 M/sec
> ( +-  0.00% )
>         48,362,236 branch-misses             #    1.70% of all branches
> ( +-  0.01% )
>
>        3.688849872 seconds time elapsed
> ( +-  0.13% )
>
> This was with self hosting lld. Looks like the patch is not really improving
> from the numbers seen using perf.
>
> Shankar Easwaran
>
> On 3/18/2015 11:32 AM, Rafael Espíndola wrote:
>>
>> As with anything threading related, it might also be worth
>> benchmarking it on Windows.
>>
>> On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org>
>> wrote:
>>>
>>> It looks like these are the right numbers and Strange, I dont see a huge
>>> advantage of the patch trying to parallelize writing output sections in
>>> parallel.
>>>
>>>
>>> On 3/18/2015 11:23 AM, Rafael Espíndola wrote:
>>>>
>>>> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
>>>> wrote:
>>>>>
>>>>> Does this repeat with the same numbers across similar tries ?
>>>>
>>>> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
>>>> sanity check I got
>>>>
>>>>
>>>> master:
>>>>          1850.315854      task-clock (msec)         #    0.999 CPUs
>>>> utilized            ( +-  0.20% )
>>>>                1,246      context-switches          #    0.673 K/sec
>>>>                    0      cpu-migrations            #    0.000 K/sec
>>>>                  ( +-100.00% )
>>>>              191,223      page-faults               #    0.103 M/sec
>>>>                  ( +-  0.00% )
>>>>        5,570,279,746      cycles                    #    3.010 GHz
>>>>                  ( +-  0.08% )
>>>>        3,076,652,220      stalled-cycles-frontend   #   55.23% frontend
>>>> cycles idle     ( +-  0.15% )
>>>>      <not supported>      stalled-cycles-backend
>>>>        6,061,467,442      instructions              #    1.09  insns per
>>>> cycle
>>>>                                                     #    0.51  stalled
>>>> cycles per insn  ( +-  0.00% )
>>>>        1,262,014,047      branches                  #  682.053 M/sec
>>>>                  ( +-  0.00% )
>>>>           26,526,169      branch-misses             #    2.10% of all
>>>> branches          ( +-  0.00% )
>>>>
>>>>          1.852094924 seconds time elapsed
>>>>             ( +-  0.20% )
>>>>
>>>> master minus your patch:
>>>>
>>>>          1837.986418      task-clock (msec)         #    0.999 CPUs
>>>> utilized            ( +-  0.01% )
>>>>                1,170      context-switches          #    0.637 K/sec
>>>>                    0      cpu-migrations            #    0.000 K/sec
>>>>              191,225      page-faults               #    0.104 M/sec
>>>>                  ( +-  0.00% )
>>>>        5,517,484,340      cycles                    #    3.002 GHz
>>>>                  ( +-  0.01% )
>>>>        3,036,583,530      stalled-cycles-frontend   #   55.04% frontend
>>>> cycles idle     ( +-  0.02% )
>>>>      <not supported>      stalled-cycles-backend
>>>>        6,004,436,870      instructions              #    1.09  insns per
>>>> cycle
>>>>                                                     #    0.51  stalled
>>>> cycles per insn  ( +-  0.00% )
>>>>        1,250,685,716      branches                  #  680.465 M/sec
>>>>                  ( +-  0.00% )
>>>>           26,539,486      branch-misses             #    2.12% of all
>>>> branches          ( +-  0.00% )
>>>>
>>>>          1.839759787 seconds time elapsed
>>>>             ( +-  0.01% )
>>>>
>>>>
>>>> Cheers,
>>>> Rafael
>>>>
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>>> by
>>> the Linux Foundation
>>>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> the Linux Foundation
>




More information about the llvm-commits mailing list