[LLVMdev] On LLD performance

Fri Mar 13 12:35:05 PDT 2015

On 3/13/2015 1:59 PM, Rafael Espíndola wrote:
>> I will do a run with --merge-strings. This should probably the the
>> default to match other ELF linkers.
> Trying --merge-strings with today's trunk I got
>
> * comment got 77 797 bytes smaller.
> * rodata got 9 394 257 bytes smaller.
We can significantly improve merge string performance by delaying 
merging strings until the sections/atoms are garbage collected. We do it 
very early in the reader.

Are you using oprofile to get this stats ?

>
> Comparing with gold, comment now has the same size and rodata is 55
> 021 bytes bigger.
>
> Amusingly, merging strings seems to make lld a bit faster. With
> today's files I got:
>
> lld:
> ---------------------------------------------------------------------------
>
>         1985.256427      task-clock (msec)         #    0.999 CPUs
> utilized            ( +-  0.07% )
>               1,152      context-switches          #    0.580 K/sec
>                   0      cpu-migrations            #    0.000 K/sec
>                 ( +-100.00% )
>             199,309      page-faults               #    0.100 M/sec
>       5,970,383,833      cycles                    #    3.007 GHz
>                 ( +-  0.07% )
>       3,413,740,580      stalled-cycles-frontend   #   57.18% frontend
> cycles idle     ( +-  0.12% )
>     <not supported>      stalled-cycles-backend
>       6,240,156,987      instructions              #    1.05  insns per
> cycle
>                                                    #    0.55  stalled
> cycles per insn  ( +-  0.01% )
>       1,293,186,347      branches                  #  651.395 M/sec
>                 ( +-  0.01% )
>          26,687,288      branch-misses             #    2.06% of all
> branches          ( +-  0.00% )
>
>         1.987125976 seconds time elapsed
>            ( +-  0.07% )
> -----------------------------------------------------------------------------------
> ldd --merge-strings:
>
> ------------------------------------------------------------------------------
>         1912.735291      task-clock (msec)         #    0.999 CPUs
> utilized            ( +-  0.10% )
>               1,152      context-switches          #    0.602 K/sec
>                   0      cpu-migrations            #    0.000 K/sec
>                 ( +-100.00% )
>             187,916      page-faults               #    0.098 M/sec
>                 ( +-  0.00% )
>       5,749,920,058      cycles                    #    3.006 GHz
>                 ( +-  0.04% )
>       3,250,485,516      stalled-cycles-frontend   #   56.53% frontend
> cycles idle     ( +-  0.07% )
>     <not supported>      stalled-cycles-backend
>       5,987,870,976      instructions              #    1.04  insns per
> cycle
>                                                    #    0.54  stalled
> cycles per insn  ( +-  0.00% )
>       1,250,773,036      branches                  #  653.919 M/sec
>                 ( +-  0.00% )
>          27,922,489      branch-misses             #    2.23% of all
> branches          ( +-  0.00% )
>
>         1.914565005 seconds time elapsed
>            ( +-  0.10% )
> ----------------------------------------------------------------------------
>
>
> gold
>
> -------------------------------------------------------------------------------
>         1000.132594      task-clock (msec)         #    0.999 CPUs
> utilized            ( +-  0.01% )
>                   0      context-switches          #    0.000 K/sec
>                   0      cpu-migrations            #    0.000 K/sec
>              77,836      page-faults               #    0.078 M/sec
>       3,002,431,314      cycles                    #    3.002 GHz
>                 ( +-  0.01% )
>       1,404,393,569      stalled-cycles-frontend   #   46.78% frontend
> cycles idle     ( +-  0.02% )
>     <not supported>      stalled-cycles-backend
>       4,110,576,101      instructions              #    1.37  insns per
> cycle
>                                                    #    0.34  stalled
> cycles per insn  ( +-  0.00% )
>         869,160,761      branches                  #  869.046 M/sec
>                 ( +-  0.00% )
>          15,691,670      branch-misses             #    1.81% of all
> branches          ( +-  0.00% )
>
>         1.001044905 seconds time elapsed
>            ( +-  0.01% )
> -------------------------------------------------------------------------------
>
> I have attached the run.sh script I used to collect the numbers.
>
> Cheers,
> Rafael

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation