[LLVMdev] On LLD performance
Rui Ueyama
ruiu at google.com
Fri Mar 13 12:55:58 PDT 2015
On Fri, Mar 13, 2015 at 12:35 PM, Shankar Easwaran <shankare at codeaurora.org>
wrote:
> On 3/13/2015 1:59 PM, Rafael EspĂndola wrote:
>
>> I will do a run with --merge-strings. This should probably the the
>>> default to match other ELF linkers.
>>>
>> Trying --merge-strings with today's trunk I got
>>
>> * comment got 77 797 bytes smaller.
>> * rodata got 9 394 257 bytes smaller.
>>
> We can significantly improve merge string performance by delaying merging
> strings until the sections/atoms are garbage collected. We do it very early
> in the reader.
>
>From my experience, I can say it's very hard to predict doing something can
significantly improve performance just by taking a look at code and
thinking.
> Are you using oprofile to get this stats ?
>
>
>
>> Comparing with gold, comment now has the same size and rodata is 55
>> 021 bytes bigger.
>>
>> Amusingly, merging strings seems to make lld a bit faster. With
>> today's files I got:
>>
>> lld:
>> ------------------------------------------------------------
>> ---------------
>>
>> 1985.256427 task-clock (msec) # 0.999 CPUs
>> utilized ( +- 0.07% )
>> 1,152 context-switches # 0.580 K/sec
>> 0 cpu-migrations # 0.000 K/sec
>> ( +-100.00% )
>> 199,309 page-faults # 0.100 M/sec
>> 5,970,383,833 cycles # 3.007 GHz
>> ( +- 0.07% )
>> 3,413,740,580 stalled-cycles-frontend # 57.18% frontend
>> cycles idle ( +- 0.12% )
>> <not supported> stalled-cycles-backend
>> 6,240,156,987 instructions # 1.05 insns per
>> cycle
>> # 0.55 stalled
>> cycles per insn ( +- 0.01% )
>> 1,293,186,347 branches # 651.395 M/sec
>> ( +- 0.01% )
>> 26,687,288 branch-misses # 2.06% of all
>> branches ( +- 0.00% )
>>
>> 1.987125976 seconds time elapsed
>> ( +- 0.07% )
>> ------------------------------------------------------------
>> -----------------------
>> ldd --merge-strings:
>>
>> ------------------------------------------------------------
>> ------------------
>> 1912.735291 task-clock (msec) # 0.999 CPUs
>> utilized ( +- 0.10% )
>> 1,152 context-switches # 0.602 K/sec
>> 0 cpu-migrations # 0.000 K/sec
>> ( +-100.00% )
>> 187,916 page-faults # 0.098 M/sec
>> ( +- 0.00% )
>> 5,749,920,058 cycles # 3.006 GHz
>> ( +- 0.04% )
>> 3,250,485,516 stalled-cycles-frontend # 56.53% frontend
>> cycles idle ( +- 0.07% )
>> <not supported> stalled-cycles-backend
>> 5,987,870,976 instructions # 1.04 insns per
>> cycle
>> # 0.54 stalled
>> cycles per insn ( +- 0.00% )
>> 1,250,773,036 branches # 653.919 M/sec
>> ( +- 0.00% )
>> 27,922,489 branch-misses # 2.23% of all
>> branches ( +- 0.00% )
>>
>> 1.914565005 seconds time elapsed
>> ( +- 0.10% )
>> ------------------------------------------------------------
>> ----------------
>>
>>
>> gold
>>
>> ------------------------------------------------------------
>> -------------------
>> 1000.132594 task-clock (msec) # 0.999 CPUs
>> utilized ( +- 0.01% )
>> 0 context-switches # 0.000 K/sec
>> 0 cpu-migrations # 0.000 K/sec
>> 77,836 page-faults # 0.078 M/sec
>> 3,002,431,314 cycles # 3.002 GHz
>> ( +- 0.01% )
>> 1,404,393,569 stalled-cycles-frontend # 46.78% frontend
>> cycles idle ( +- 0.02% )
>> <not supported> stalled-cycles-backend
>> 4,110,576,101 instructions # 1.37 insns per
>> cycle
>> # 0.34 stalled
>> cycles per insn ( +- 0.00% )
>> 869,160,761 branches # 869.046 M/sec
>> ( +- 0.00% )
>> 15,691,670 branch-misses # 1.81% of all
>> branches ( +- 0.00% )
>>
>> 1.001044905 seconds time elapsed
>> ( +- 0.01% )
>> ------------------------------------------------------------
>> -------------------
>>
>> I have attached the run.sh script I used to collect the numbers.
>>
>> Cheers,
>> Rafael
>>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by the Linux Foundation
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150313/08958e26/attachment.html>
More information about the llvm-dev
mailing list