[LLVMdev] On LLD performance
Shankar Easwaran
shankare at codeaurora.org
Fri Mar 13 12:35:05 PDT 2015
On 3/13/2015 1:59 PM, Rafael EspĂndola wrote:
>> I will do a run with --merge-strings. This should probably the the
>> default to match other ELF linkers.
> Trying --merge-strings with today's trunk I got
>
> * comment got 77 797 bytes smaller.
> * rodata got 9 394 257 bytes smaller.
We can significantly improve merge string performance by delaying
merging strings until the sections/atoms are garbage collected. We do it
very early in the reader.
Are you using oprofile to get this stats ?
>
> Comparing with gold, comment now has the same size and rodata is 55
> 021 bytes bigger.
>
> Amusingly, merging strings seems to make lld a bit faster. With
> today's files I got:
>
> lld:
> ---------------------------------------------------------------------------
>
> 1985.256427 task-clock (msec) # 0.999 CPUs
> utilized ( +- 0.07% )
> 1,152 context-switches # 0.580 K/sec
> 0 cpu-migrations # 0.000 K/sec
> ( +-100.00% )
> 199,309 page-faults # 0.100 M/sec
> 5,970,383,833 cycles # 3.007 GHz
> ( +- 0.07% )
> 3,413,740,580 stalled-cycles-frontend # 57.18% frontend
> cycles idle ( +- 0.12% )
> <not supported> stalled-cycles-backend
> 6,240,156,987 instructions # 1.05 insns per
> cycle
> # 0.55 stalled
> cycles per insn ( +- 0.01% )
> 1,293,186,347 branches # 651.395 M/sec
> ( +- 0.01% )
> 26,687,288 branch-misses # 2.06% of all
> branches ( +- 0.00% )
>
> 1.987125976 seconds time elapsed
> ( +- 0.07% )
> -----------------------------------------------------------------------------------
> ldd --merge-strings:
>
> ------------------------------------------------------------------------------
> 1912.735291 task-clock (msec) # 0.999 CPUs
> utilized ( +- 0.10% )
> 1,152 context-switches # 0.602 K/sec
> 0 cpu-migrations # 0.000 K/sec
> ( +-100.00% )
> 187,916 page-faults # 0.098 M/sec
> ( +- 0.00% )
> 5,749,920,058 cycles # 3.006 GHz
> ( +- 0.04% )
> 3,250,485,516 stalled-cycles-frontend # 56.53% frontend
> cycles idle ( +- 0.07% )
> <not supported> stalled-cycles-backend
> 5,987,870,976 instructions # 1.04 insns per
> cycle
> # 0.54 stalled
> cycles per insn ( +- 0.00% )
> 1,250,773,036 branches # 653.919 M/sec
> ( +- 0.00% )
> 27,922,489 branch-misses # 2.23% of all
> branches ( +- 0.00% )
>
> 1.914565005 seconds time elapsed
> ( +- 0.10% )
> ----------------------------------------------------------------------------
>
>
> gold
>
> -------------------------------------------------------------------------------
> 1000.132594 task-clock (msec) # 0.999 CPUs
> utilized ( +- 0.01% )
> 0 context-switches # 0.000 K/sec
> 0 cpu-migrations # 0.000 K/sec
> 77,836 page-faults # 0.078 M/sec
> 3,002,431,314 cycles # 3.002 GHz
> ( +- 0.01% )
> 1,404,393,569 stalled-cycles-frontend # 46.78% frontend
> cycles idle ( +- 0.02% )
> <not supported> stalled-cycles-backend
> 4,110,576,101 instructions # 1.37 insns per
> cycle
> # 0.34 stalled
> cycles per insn ( +- 0.00% )
> 869,160,761 branches # 869.046 M/sec
> ( +- 0.00% )
> 15,691,670 branch-misses # 1.81% of all
> branches ( +- 0.00% )
>
> 1.001044905 seconds time elapsed
> ( +- 0.01% )
> -------------------------------------------------------------------------------
>
> I have attached the run.sh script I used to collect the numbers.
>
> Cheers,
> Rafael
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
More information about the llvm-dev
mailing list