[LLVMdev] On LLD performance

Sean Silva chisophugis at gmail.com
Fri Mar 13 22:16:29 PDT 2015


On Fri, Mar 13, 2015 at 12:38 PM, Eric Christopher <echristo at gmail.com>
wrote:

>
>
> On Fri, Mar 13, 2015 at 12:14 PM Rafael EspĂ­ndola <
> rafael.espindola at gmail.com> wrote:
>
>> > I will do a run with --merge-strings. This should probably the the
>> > default to match other ELF linkers.
>>
>> Trying --merge-strings with today's trunk I got
>>
>> * comment got 77 797 bytes smaller.
>> * rodata got 9 394 257 bytes smaller.
>>
>> Comparing with gold, comment now has the same size and rodata is 55
>> 021 bytes bigger.
>>
>> Amusingly, merging strings seems to make lld a bit faster. With
>>
>
> As a side note this is a really good thing. The idea is that the linker
> should largely be I/O bound and not processor bound.
>

The conclusion I draw is that IO is in the critical path. Not that the
entire linking process is currently IO bound. The latter is something to be
excited about (I haven't measured if that is the case, but I doubt it since
that means that gold must be getting double the FS throughput). The former
is true for any program whose ultimate goal is just to write out a file and
then it exits after finishing writing it out (by definition the last task
is on the critical path). Few command line programs *don't* fall into this
category.

-- Sean Silva


>
> -eric
>
>
>> today's files I got:
>>
>> lld:
>> ------------------------------------------------------------
>> ---------------
>>
>>        1985.256427      task-clock (msec)         #    0.999 CPUs
>> utilized            ( +-  0.07% )
>>              1,152      context-switches          #    0.580 K/sec
>>                  0      cpu-migrations            #    0.000 K/sec
>>                ( +-100.00% )
>>            199,309      page-faults               #    0.100 M/sec
>>      5,970,383,833      cycles                    #    3.007 GHz
>>                ( +-  0.07% )
>>      3,413,740,580      stalled-cycles-frontend   #   57.18% frontend
>> cycles idle     ( +-  0.12% )
>>    <not supported>      stalled-cycles-backend
>>      6,240,156,987      instructions              #    1.05  insns per
>> cycle
>>                                                   #    0.55  stalled
>> cycles per insn  ( +-  0.01% )
>>      1,293,186,347      branches                  #  651.395 M/sec
>>                ( +-  0.01% )
>>         26,687,288      branch-misses             #    2.06% of all
>> branches          ( +-  0.00% )
>>
>>        1.987125976 seconds time elapsed
>>           ( +-  0.07% )
>> ------------------------------------------------------------
>> -----------------------
>> ldd --merge-strings:
>>
>> ------------------------------------------------------------
>> ------------------
>>        1912.735291      task-clock (msec)         #    0.999 CPUs
>> utilized            ( +-  0.10% )
>>              1,152      context-switches          #    0.602 K/sec
>>                  0      cpu-migrations            #    0.000 K/sec
>>                ( +-100.00% )
>>            187,916      page-faults               #    0.098 M/sec
>>                ( +-  0.00% )
>>      5,749,920,058      cycles                    #    3.006 GHz
>>                ( +-  0.04% )
>>      3,250,485,516      stalled-cycles-frontend   #   56.53% frontend
>> cycles idle     ( +-  0.07% )
>>    <not supported>      stalled-cycles-backend
>>      5,987,870,976      instructions              #    1.04  insns per
>> cycle
>>                                                   #    0.54  stalled
>> cycles per insn  ( +-  0.00% )
>>      1,250,773,036      branches                  #  653.919 M/sec
>>                ( +-  0.00% )
>>         27,922,489      branch-misses             #    2.23% of all
>> branches          ( +-  0.00% )
>>
>>        1.914565005 seconds time elapsed
>>           ( +-  0.10% )
>> ------------------------------------------------------------
>> ----------------
>>
>>
>> gold
>>
>> ------------------------------------------------------------
>> -------------------
>>        1000.132594      task-clock (msec)         #    0.999 CPUs
>> utilized            ( +-  0.01% )
>>                  0      context-switches          #    0.000 K/sec
>>                  0      cpu-migrations            #    0.000 K/sec
>>             77,836      page-faults               #    0.078 M/sec
>>      3,002,431,314      cycles                    #    3.002 GHz
>>                ( +-  0.01% )
>>      1,404,393,569      stalled-cycles-frontend   #   46.78% frontend
>> cycles idle     ( +-  0.02% )
>>    <not supported>      stalled-cycles-backend
>>      4,110,576,101      instructions              #    1.37  insns per
>> cycle
>>                                                   #    0.34  stalled
>> cycles per insn  ( +-  0.00% )
>>        869,160,761      branches                  #  869.046 M/sec
>>                ( +-  0.00% )
>>         15,691,670      branch-misses             #    1.81% of all
>> branches          ( +-  0.00% )
>>
>>        1.001044905 seconds time elapsed
>>           ( +-  0.01% )
>> ------------------------------------------------------------
>> -------------------
>>
>> I have attached the run.sh script I used to collect the numbers.
>>
>> Cheers,
>> Rafael
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150313/64677c8e/attachment.html>


More information about the llvm-dev mailing list