[cfe-dev] Clang build of ATLAS (and speed comparison)
Vincent Habchi
vince at macports.org
Thu Sep 8 02:22:01 PDT 2011
Hi there,
together with Clint Whaley, the author and maintainer of the ATLAS suite, we are currently evaluating clang/llvm performance to support the clang compiler for ATLAS building, besides gcc, in the scope (on my side) to add this feature to the Macports port.
As a preliminary report, here is what spits out the ATLAS built-in benchmark (‘make time’) after a compilation with the -Oz flag. Reference stands for an installation presumably with GCC4.x on Linux (Clint, could you elaborate on this?). Machine is a 3-year old MacBook (late 2008, Core 2 Duo), compiler is the version of clang shipped with latest Xcode 4.2:
> Apple clang version 3.0 (tags/Apple/clang-211.9) (based on LLVM 3.0svn)
Dragonegg with gcc4.5 is used for fortran compilation, but I don’t think it is relevant here (again, Clint, could you confirm this ?). Anyway, it is pure LLVM output code.
---
The times labeled Reference are for ATLAS as installed by the authors.
NAMING ABBREVIATIONS:
kSelMM : selected matmul kernel (may be hand-tuned)
kGenMM : generated matmul kernel
kMM_NT : worst no-copy kernel
kMM_TN : best no-copy kernel
BIG_MM : large GEMM timing (usually N=1600); estimate of asymptotic peak
kMV_N : NoTranspose matvec kernel
kMV_T : Transpose matvec kernel
kGER : GER (rank-1 update) kernel
Kernel routines are not called by the user directly, and their
performance is often somewhat different than the total
algorithm (eg, dGER perf may differ from dkGER)
Reference clock rate=2394Mhz, new rate=2400Mhz
Refrenc : % of clock rate achieved by reference install
Present : % of clock rate achieved by present ATLAS install
single precision double precision
******************************** *******************************
real complex real complex
--------------- --------------- --------------- ---------------
Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
========= ======= ======= ======= ======= ======= ======= ======= =======
kSelMM 646.0 608.2 611.2 571.5 369.1 368.5 355.4 356.5
kGenMM 185.1 154.5 183.0 153.1 173.7 147.6 176.1 151.1
kMM_NT 172.7 145.6 166.6 140.9 164.0 132.5 174.6 140.0
kMM_TN 185.8 150.8 178.6 148.5 174.1 114.3 167.9 132.5
BIG_MM 603.3 582.4 600.5 590.2 351.2 354.8 341.8 348.9
kMV_N 74.5 100.5 147.7 161.1 43.8 54.3 88.1 98.5
kMV_T 70.6 101.4 156.2 176.5 48.7 53.7 90.5 104.4
kGER 39.5 46.6 81.7 86.8 20.4 26.9 40.4 52.8
---
What we see is that, while clang seems to outperform GCC on level2 BLAS ops (matrix • vector), it is consistently 20 % inferior on level3 ops (lines 2, 3 and 4).
The -O0 produces an oddity we are currently investigating, so I will not report the results now. -O3 figures are about the same. This is summarized in the following table, with both clang and gcc compilations done on the same MacBook machine previously mentioned (-O2 was used for gcc):
single precision double precision
******************************** *******************************
real complex real complex
--------------- --------------- --------------- ---------------
Benchmark Clang GCC4.5 Clang GCC4.5 Clang GCC4.5 Clang GCC4.5
========= ====== ====== ====== ===== ===== ===== ===== =====
kSelMM 585.5 592.9 556.9 630.0 368.4 368.6 356.4 359.2
kGenMM 154.4 180.4 153.1 183.4 147.7 165.6 159.4 172.5
kMM_NT 146.3 165.7 146.0 165.4 132.3 138.9 139.9 161.7
kMM_TN 150.9 181.2 149.9 180.8 116.3 168.1 134.2 165.8
BIG_MM 572.7 589.4 581.7 550.2 354.7 353.9 347.8 347.1
kMV_N 96.9 83.1 164.2 160.3 53.7 48.3 102.7 97.7
kMV_T 96.3 75.2 173.1 166.2 53.2 55.1 98.2 97.9
kGER 46.9 41.8 86.3 85.5 26.9 21.7 53.0 42.0
Please note, and this is also important, that neither at -O0, nor at -O3 does clang seem to produce correct code: both version fails the ATLAS sanity checks, either with wrong results (at -O0) or with crashes at (-O3). -Oz does qualify, though.
Vincent
More information about the cfe-dev
mailing list