[LLVMdev] Clang/llvm performance tests on FreeBSD 10.0-CURRENT
Dimitry Andric
dimitry at andric.com
Tue Sep 4 13:49:39 PDT 2012
Hi all,
I recently performed a series of compiler performance tests on FreeBSD
10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against
clang 3.1 and clang 3.2.
The attached text file[1] contains more information about the tests,
some semi-cooked performance data, and my conclusions. Any errors and
omissions are also my fault, so if you notice them, please let me know.
The executive summary: clang compiles mostly faster than gcc (sometimes
much faster), and uses significantly less memory.
Finally, please note these tests were purely about compilation speed,
not about the performance of the resulting executables. This still
needs to be tested.
-Dimitry
[1]: Also available at:
<http://www.andric.com/freebsd/perftest/perftest-2012-09-01a.txt>
-------------- next part --------------
COMPILER PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012
==================================================================
INTRODUCTION
------------
The compilers tested were:
- gcc 4.2.1, the system compiler in FreeBSD, which is compiled by gcc 4.2.1.
- gcc 4.7.1, from the official gcc.gnu.org release, compiled via a three-stage
bootstrap, so the final compiler has been compiled by gcc 4.7.1.
- clang 3.1 (branches/release_31 156863), which is the default version of clang
in FreeBSD 10-CURRENT before r239462. The used executable was compiled by a
previous copy of itself.
- clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD
10.0-CURRENT, after r239462. The used executable was compiled by a previous
copy of itself.
All tests were run on ref10-amd64.freebsd.org, which is a Dell 2950, 1.86GHz
Core2 Xeon, 2x4 Core, 16G RAM. It runs FreeBSD/amd64 10.0-CURRENT #0 r231914:
Sun Feb 19 17:24:37 UTC 2012.
Each build was repeated 6 times, after cleaning out the object directories, and
syncing. Each build was timed using the system time(1) command, using the -l
argument to obtain rusage information.
The programs tested by compilation were:
- A large C++ program: clang 3.2, as it occurs in the FreeBSD 10.0-CURRENT
source tree as of r239532.
- A medium-large C program: gcc 4.2.1, as it occurs in the FreeBSD 10.0-CURRENT
source tree as of r239532.
- A large C++ library: boost 1.50.0, the officially released version from
<http://www.boost.org/>.
Building a large C++ program (clang 3.2) single-threaded
========================================================
Using clang 3.1:
----------------
N Min Max Median Avg Stddev
real 6 2283.69 2288.46 2285.74 2285.505 1.6470064
user 6 2145.2 2147.2 2146.18 2146.0567 0.68266146
sys 6 128.3 132.08 130.65 130.54833 1.256653
maxrss 6 179264 179264 179264 179264 0
ixrss 6 21407 21436 21420 21419.833 9.6211572
idrss 6 3628 3632 3630 3629.8333 1.3291601
isrss 6 252 252 252 252 0
minflt 6 12485556 12485556 12485556 12485556 0
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 0 0 0 0
oublock 6 2058 2106 2103 2081.3333 25.216397
msgsnd 6 18 18 18 18 0
msgrcv 6 0 0 0 0 0
nsignals 6 1878 1878 1878 1878 0
nvcsw 6 16288 16357 16333 16320.667 29.615311
nivcsw 6 2071535 3998751 3057756 2966314 635381.66
Using clang 3.2:
----------------
N Min Max Median Avg Stddev
real 6 2358.61 2362.84 2362.67 2361.22 1.7831321
user 6 2215.33 2221.13 2218.72 2218.57 2.0094278
sys 6 130.78 134.63 133.41 132.99833 1.4702301
maxrss 6 177796 177796 177796 177796 0
ixrss 6 21388 21413 21408 21400.833 11.052903
idrss 6 3702 3707 3706 3704.6667 2.2509257
isrss 6 253 253 253 253 0
minflt 6 12583827 12583827 12583827 12583827 0
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 0 0 0 0
oublock 6 2036 2074 2071 2054.8333 19.589963
msgsnd 6 18 26 26 23.333333 4.1311822
msgrcv 6 0 0 0 0 0
nsignals 6 1878 1878 1878 1878 0
nvcsw 6 16266 16391 16354 16327.667 53.909801
nivcsw 6 2118900 3891231 3534528 3168715.7 673236.29
Using gcc 4.2.1:
----------------
N Min Max Median Avg Stddev
real 6 4238.49 4241.76 4240.78 4240.1867 1.2375244
user 6 3903.48 3908.6 3907.58 3906.5583 1.8932661
sys 6 358.38 361.43 359.94 359.94667 1.1494984
maxrss 6 568592 568592 568592 568592 0
ixrss 6 6348 6353 6350 6350.3333 1.6329932
idrss 6 3495 3498 3497 3496.5 1.0488088
isrss 6 146 146 146 146 0
minflt 6 47304156 47304184 47304175 47304172 10.545141
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 0 0 0 0
oublock 6 2620 2754 2732 2683.6667 68.730391
msgsnd 6 0 0 0 0 0
msgrcv 6 0 0 0 0 0
nsignals 6 1878 1878 1878 1878 0
nvcsw 6 67561 67763 67674 67648 75.198404
nivcsw 6 3994087 5821442 4846679 4810028.2 597366.52
Using gcc 4.7.1:
----------------
N Min Max Median Avg Stddev
real 6 3818.41 3974.54 3820.49 3846.7417 62.715466
user 6 3506.86 3591.97 3509.96 3522.8283 33.896088
sys 6 333.58 364.34 338.93 340.70833 11.839839
maxrss 6 480724 480736 480724 480727.33 5.316639
ixrss 6 12173 12198 12194 12188.333 9.4375138
idrss 6 1520 1523 1522 1521.8333 1.1690452
isrss 6 134 134 134 134 0
minflt 6 38406568 38406673 38406592 38406599 38.768544
majflt 6 0 90 0 20.333333 36.45088
nswap 6 0 0 0 0 0
inblock 6 0 4775 0 1233.3333 2028.0327
oublock 6 2266 2301 2286 2284.3333 13.662601
msgsnd 6 30 31 30 30.166667 0.40824829
msgrcv 6 0 0 0 0 0
nsignals 6 1878 1878 1878 1878 0
nvcsw 6 59792 67936 60369 61859.833 3186.3204
nivcsw 6 2867702 4546665 4361653 3753550.8 769382.51
Summary:
--------
For building this specific large C++ program, gcc 4.2.1 is ~86% slower than
clang 3.1 in real time, ~82% slower in user time, and ~176% slower in system
time. The maximum resident set size during building is ~217% larger, and it
causes ~279% more page reclaims.
Though gcc 4.7.1 is faster than its older version, it is still ~68% slower than
clang 3.1 in real time, ~64% slower in user time, and ~161% slower in system
time. The maximum resident set size during building is ~220% larger, and it
causes ~208% more page reclaims.
Finally, clang 3.2 is ~3% slower than clang 3.1 in both real time and user time,
and ~2% slower in system time. The maximum resident set size and the number of
page reclaims during building are approximately equal.
Conclusion:
-----------
Clang 3.1 is clearly the fastest compiler for building this specific large C++
program, with clang 3.2 trailing closely behind. Both are significantly faster,
and use much less memory than either version of gcc.
Building a medium-large C program (gcc 4.2.1) single-threaded
=============================================================
Using clang 3.1:
----------------
N Min Max Median Avg Stddev
real 6 303.31 304.06 303.65 303.67167 0.24991332
user 6 275.42 277.09 275.99 276.11167 0.57766484
sys 6 24.92 26.15 25.6 25.656667 0.44643775
maxrss 6 177876 177876 177876 177876 0
ixrss 6 20529 20559 20544 20542.833 12.38413
idrss 6 3618 3623 3621 3620.3333 1.9663842
isrss 6 247 247 247 247 0
minflt 6 2214250 2214250 2214250 2214250 0
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 0 0 0 0
oublock 6 677 677 677 677 0
msgsnd 6 18 18 18 18 0
msgrcv 6 0 0 0 0 0
nsignals 6 883 883 883 883 0
nvcsw 6 5705 5837 5819 5793.6667 49.33018
nivcsw 6 205418 467152 449398 371699.67 114414.58
Using clang 3.2:
----------------
N Min Max Median Avg Stddev
real 6 330.22 331.23 330.95 330.69833 0.43687145
user 6 301.29 302.59 302.3 302.05667 0.49649438
sys 6 26.12 27.19 27.06 26.875 0.39747956
maxrss 6 186260 186260 186260 186260 0
ixrss 6 20639 20674 20660 20656.833 14.469508
idrss 6 3699 3705 3703 3702.3333 2.1602469
isrss 6 316 319 318 317.33333 1.2110601
minflt 6 2290933 2290934 2290934 2290933.7 0.51614557
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 0 0 0 0
oublock 6 668 669 668 668.16667 0.40824829
msgsnd 6 18 18 18 18 0
msgrcv 6 0 0 0 0 0
nsignals 6 883 883 883 883 0
nvcsw 6 5783 5822 5801 5799 17.944358
nivcsw 6 111115 520961 396082 316725.33 164041.32
Using gcc 4.2.1:
----------------
N Min Max Median Avg Stddev
real 6 422.68 425.44 423.23 423.47333 1.0273396
user 6 389.1 391.67 390.58 390.41333 0.82734918
sys 6 36.85 39.2 38.65 38.23 0.83840324
maxrss 6 392560 392560 392560 392560 0
ixrss 6 5529 5542 5534 5534.5 6.0580525
idrss 6 3915 3924 3919 3919 4.1472883
isrss 6 142 142 142 142 0
minflt 6 4055461 4055464 4055463 4055462.7 1.21063
majflt 6 0 4 0 0.66666667 1.6329932
nswap 6 0 0 0 0 0
inblock 6 0 730 0 121.66667 298.02125
oublock 6 659 693 662 667 12.884099
msgsnd 6 0 0 0 0 0
msgrcv 6 0 0 0 0 0
nsignals 6 883 883 883 883 0
nvcsw 6 15645 16454 15874 15888 298.36019
nivcsw 6 121293 661776 414611 363556.83 207101.28
Using gcc 4.7.1:
----------------
N Min Max Median Avg Stddev
real 6 461.58 462.55 462.01 461.98333 0.40287302
user 6 425.22 426.36 425.92 425.835 0.43825791
sys 6 40.83 42.94 41.99 41.925 0.71034499
maxrss 6 445624 445624 445624 445624 0
ixrss 6 10781 10816 10801 10797.5 12.405644
idrss 6 2427 2433 2430 2430 2.1908902
isrss 6 178 178 178 178 0
minflt 6 3883735 3883740 3883739 3883738 2.3664319
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 14 0 2.3333333 5.7154761
oublock 6 677 681 679 679 1.4142136
msgsnd 6 20 20 20 20 0
msgrcv 6 0 0 0 0 0
nsignals 6 883 883 883 883 0
nvcsw 6 16411 16660 16532 16542.333 98.544744
nivcsw 6 284414 901533 384379 449845.33 241447.41
Summary:
--------
For building this specific medium C program, gcc 4.2.1 is ~40% slower than clang
3.1 in real time, ~41% slower in user time, and ~49% slower in system time. The
maximum resident set size during building is ~121% larger, and it causes ~83%
more page reclaims.
For C, gcc 4.7.1 is even slower than its older version; it is ~52% slower than
clang 3.1 in real time, ~54% slower in user time, and ~63% slower in system
time. The maximum resident set size during building is ~151% larger, and it
causes ~75% more page reclaims.
Finally, clang 3.2 is ~9% slower than clang 3.1 in both real time and user
time, and ~5% slower in system time. The maximum resident set size during
building is ~5% larger, and it causes ~4% more page reclaims.
Conclusion:
-----------
Clang 3.1 is clearly the fastest compiler for building this specific medium-
large C program, with clang 3.2 somewhat behind. Both are significantly faster,
and use much less memory than either version of gcc.
Building a large C++ library (boost 1.50.0) single-threaded
===========================================================
Using clang 3.1:
----------------
N Min Max Median Avg Stddev
real 6 1056.69 1060.49 1059.09 1058.6783 1.5028695
user 6 975.49 978.88 978.53 977.55 1.4653464
sys 6 73.75 76.42 74.87 74.95 1.0609618
maxrss 6 212324 216712 213668 214260.67 1774.6309
ixrss 6 22472 22549 22525 22514.5 31.232995
idrss 6 3793 3806 3802 3800.1667 5.492419
isrss 6 276 277 277 276.5 0.54772256
minflt 6 9543701 9543702 9543701 9543701.3 0.51234754
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 0 0 0 0
oublock 6 1453 1461 1457 1455.8333 3.3714487
msgsnd 6 115 115 115 115 0
msgrcv 6 0 0 0 0 0
nsignals 6 0 0 0 0 0
nvcsw 6 7352 7834 7576 7567.1667 167.70023
nivcsw 6 27478 2350999 1699745 1337037.8 1040439
Using clang 3.2:
----------------
N Min Max Median Avg Stddev
real 6 1075.33 1077.94 1076.39 1076.4267 0.93958856
user 6 995.14 997.61 995.43 995.88833 0.9489661
sys 6 72.34 74.67 74.23 73.843333 0.81563881
maxrss 6 208552 211148 210436 209936 921.08458
ixrss 6 22437 22484 22458 22459 19.768662
idrss 6 3869 3878 3873 3873.3333 3.8815804
isrss 6 275 275 275 275 0
minflt 6 9351477 9351478 9351478 9351477.5 0.54772256
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 0 0 0 0
oublock 6 1448 1454 1449 1449.6667 2.4221203
msgsnd 6 115 115 115 115 0
msgrcv 6 0 0 0 0 0
nsignals 6 0 0 0 0 0
nvcsw 6 10481 12934 11049 11105.333 936.9249
nivcsw 6 975292 2383586 1633650 1615797.3 605542.76
Using gcc 4.2.1:
----------------
N Min Max Median Avg Stddev
real 6 1037.86 1047.78 1039.71 1040.21 3.8054592
user 6 938.74 944.49 941.52 941.55667 1.8382999
sys 6 86.37 92.84 89.89 89.57 2.1105639
maxrss 6 560256 560316 560272 560274 21.428952
ixrss 6 6435 6453 6441 6443 7.2663608
idrss 6 3563 3573 3566 3567.5 4.0373258
isrss 6 136 136 136 136 0
minflt 6 12360490 12360492 12360491 12360491 0.63245553
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 4283 0 713.83333 1748.5274
oublock 6 2648 2656 2655 2653.3333 2.9439203
msgsnd 6 44 51 51 49.833333 2.857738
msgrcv 6 0 0 0 0 0
nsignals 6 0 0 0 0 0
nvcsw 6 7897 12281 8004 8696 1757.1741
nivcsw 6 19915 1989580 897003 957452.5 764999.83
Using gcc 4.7.1:
----------------
N Min Max Median Avg Stddev
real 6 1038.13 1041.29 1040.98 1039.92 1.3837774
user 6 937.73 943.59 941.35 941.14167 2.0323919
sys 6 89.19 95.1 91.61 91.588333 2.0745739
maxrss 6 361268 361268 361268 361268 0
ixrss 6 12431 12474 12469 12457.333 17.385818
idrss 6 1547 1552 1551 1549.8333 1.9407902
isrss 6 129 129 129 129 0
minflt 6 10455489 10455489 10455489 10455489 0
majflt 6 0 0 0 0 0
nswap 6 0 0 0 0 0
inblock 6 0 162 0 27 66.136223
oublock 6 2537 2540 2539 2538.5 1.0488088
msgsnd 6 113 113 113 113 0
msgrcv 6 0 0 0 0 0
nsignals 6 0 0 0 0 0
nvcsw 6 7778 7975 7880 7874.1667 78.036957
nivcsw 6 27055 2302383 2187401 1478808.3 946760.86
Summary:
--------
For building this specific large C++ library, clang 3.1 is ~2% slower than gcc
4.2.1 in real time, ~4% slower in user time, but ~20% faster in system time.
The maximum resident set size during building is ~162% smaller, and it causes
~30% less page reclaims.
As before, clang 3.2 is slower than its older version; it is ~3% slower than
gcc 4.2.1 in real time, ~6% slower in user time, but ~21% faster in system
time. The maximum resident set size is ~167% smaller, and it causes ~32% less
page reclaims.
Finally, gcc 4.7.1 is equally fast as gcc 4.2.1 in real time and user time, and
~2% slower in system time. The maximum resident set size is ~36% smaller, and
it causes ~15% less page reclaims.
Conclusion:
-----------
Both gcc 4.2.1 and 4.7.1 are the fastest compilers for building this specific
large C++ library, but both versions of clang are not far behind. Both versions
of gcc use quite a bit more memory than either version of clang.
================================================================================
Copyright (c) 2012 Dimitry Andric <dimitry at andric.com>
Verbatim copying and redistribution of this entire text are permitted, provided
this notice is preserved.
================================================================================
More information about the llvm-dev
mailing list