[LLVMdev] On LLD performance

Fri Mar 13 12:55:58 PDT 2015

On Fri, Mar 13, 2015 at 12:35 PM, Shankar Easwaran <shankare at codeaurora.org>
wrote:

> On 3/13/2015 1:59 PM, Rafael Espíndola wrote:
>
>> I will do a run with --merge-strings. This should probably the the
>>> default to match other ELF linkers.
>>>
>> Trying --merge-strings with today's trunk I got
>>
>> * comment got 77 797 bytes smaller.
>> * rodata got 9 394 257 bytes smaller.
>>
> We can significantly improve merge string performance by delaying merging
> strings until the sections/atoms are garbage collected. We do it very early
> in the reader.
>

>From my experience, I can say it's very hard to predict doing something can
significantly improve performance just by taking a look at code and
thinking.

> Are you using oprofile to get this stats ?
>
>
>
>> Comparing with gold, comment now has the same size and rodata is 55
>> 021 bytes bigger.
>>
>> Amusingly, merging strings seems to make lld a bit faster. With
>> today's files I got:
>>
>> lld:
>> ------------------------------------------------------------
>> ---------------
>>
>>         1985.256427      task-clock (msec)         #    0.999 CPUs
>> utilized            ( +-  0.07% )
>>               1,152      context-switches          #    0.580 K/sec
>>                   0      cpu-migrations            #    0.000 K/sec
>>                 ( +-100.00% )
>>             199,309      page-faults               #    0.100 M/sec
>>       5,970,383,833      cycles                    #    3.007 GHz
>>                 ( +-  0.07% )
>>       3,413,740,580      stalled-cycles-frontend   #   57.18% frontend
>> cycles idle     ( +-  0.12% )
>>     <not supported>      stalled-cycles-backend
>>       6,240,156,987      instructions              #    1.05  insns per
>> cycle
>>                                                    #    0.55  stalled
>> cycles per insn  ( +-  0.01% )
>>       1,293,186,347      branches                  #  651.395 M/sec
>>                 ( +-  0.01% )
>>          26,687,288      branch-misses             #    2.06% of all
>> branches          ( +-  0.00% )
>>
>>         1.987125976 seconds time elapsed
>>            ( +-  0.07% )
>> ------------------------------------------------------------
>> -----------------------
>> ldd --merge-strings:
>>
>> ------------------------------------------------------------
>> ------------------
>>         1912.735291      task-clock (msec)         #    0.999 CPUs
>> utilized            ( +-  0.10% )
>>               1,152      context-switches          #    0.602 K/sec
>>                   0      cpu-migrations            #    0.000 K/sec
>>                 ( +-100.00% )
>>             187,916      page-faults               #    0.098 M/sec
>>                 ( +-  0.00% )
>>       5,749,920,058      cycles                    #    3.006 GHz
>>                 ( +-  0.04% )
>>       3,250,485,516      stalled-cycles-frontend   #   56.53% frontend
>> cycles idle     ( +-  0.07% )
>>     <not supported>      stalled-cycles-backend
>>       5,987,870,976      instructions              #    1.04  insns per
>> cycle
>>                                                    #    0.54  stalled
>> cycles per insn  ( +-  0.00% )
>>       1,250,773,036      branches                  #  653.919 M/sec
>>                 ( +-  0.00% )
>>          27,922,489      branch-misses             #    2.23% of all
>> branches          ( +-  0.00% )
>>
>>         1.914565005 seconds time elapsed
>>            ( +-  0.10% )
>> ------------------------------------------------------------
>> ----------------
>>
>>
>> gold
>>
>> ------------------------------------------------------------
>> -------------------
>>         1000.132594      task-clock (msec)         #    0.999 CPUs
>> utilized            ( +-  0.01% )
>>                   0      context-switches          #    0.000 K/sec
>>                   0      cpu-migrations            #    0.000 K/sec
>>              77,836      page-faults               #    0.078 M/sec
>>       3,002,431,314      cycles                    #    3.002 GHz
>>                 ( +-  0.01% )
>>       1,404,393,569      stalled-cycles-frontend   #   46.78% frontend
>> cycles idle     ( +-  0.02% )
>>     <not supported>      stalled-cycles-backend
>>       4,110,576,101      instructions              #    1.37  insns per
>> cycle
>>                                                    #    0.34  stalled
>> cycles per insn  ( +-  0.00% )
>>         869,160,761      branches                  #  869.046 M/sec
>>                 ( +-  0.00% )
>>          15,691,670      branch-misses             #    1.81% of all
>> branches          ( +-  0.00% )
>>
>>         1.001044905 seconds time elapsed
>>            ( +-  0.01% )
>> ------------------------------------------------------------
>> -------------------
>>
>> I have attached the run.sh script I used to collect the numbers.
>>
>> Cheers,
>> Rafael
>>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by the Linux Foundation
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150313/08958e26/attachment.html>