[cfe-dev] Clang build of ATLAS (and speed comparison)
Evan Cheng
evan.cheng at apple.com
Fri Sep 9 08:35:10 PDT 2011
Can you point us to the source of ATLAS? Does it have a web page?
Evan
On Sep 8, 2011, at 2:22 AM, Vincent Habchi wrote:
> Hi there,
>
> together with Clint Whaley, the author and maintainer of the ATLAS suite, we are currently evaluating clang/llvm performance to support the clang compiler for ATLAS building, besides gcc, in the scope (on my side) to add this feature to the Macports port.
>
> As a preliminary report, here is what spits out the ATLAS built-in benchmark (‘make time’) after a compilation with the -Oz flag. Reference stands for an installation presumably with GCC4.x on Linux (Clint, could you elaborate on this?). Machine is a 3-year old MacBook (late 2008, Core 2 Duo), compiler is the version of clang shipped with latest Xcode 4.2:
>
>> Apple clang version 3.0 (tags/Apple/clang-211.9) (based on LLVM 3.0svn)
>
> Dragonegg with gcc4.5 is used for fortran compilation, but I don’t think it is relevant here (again, Clint, could you confirm this ?). Anyway, it is pure LLVM output code.
>
> ---
> The times labeled Reference are for ATLAS as installed by the authors.
> NAMING ABBREVIATIONS:
> kSelMM : selected matmul kernel (may be hand-tuned)
> kGenMM : generated matmul kernel
> kMM_NT : worst no-copy kernel
> kMM_TN : best no-copy kernel
> BIG_MM : large GEMM timing (usually N=1600); estimate of asymptotic peak
> kMV_N : NoTranspose matvec kernel
> kMV_T : Transpose matvec kernel
> kGER : GER (rank-1 update) kernel
> Kernel routines are not called by the user directly, and their
> performance is often somewhat different than the total
> algorithm (eg, dGER perf may differ from dkGER)
>
>
> Reference clock rate=2394Mhz, new rate=2400Mhz
> Refrenc : % of clock rate achieved by reference install
> Present : % of clock rate achieved by present ATLAS install
>
> single precision double precision
> ******************************** *******************************
> real complex real complex
> --------------- --------------- --------------- ---------------
> Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
> ========= ======= ======= ======= ======= ======= ======= ======= =======
> kSelMM 646.0 608.2 611.2 571.5 369.1 368.5 355.4 356.5
> kGenMM 185.1 154.5 183.0 153.1 173.7 147.6 176.1 151.1
> kMM_NT 172.7 145.6 166.6 140.9 164.0 132.5 174.6 140.0
> kMM_TN 185.8 150.8 178.6 148.5 174.1 114.3 167.9 132.5
> BIG_MM 603.3 582.4 600.5 590.2 351.2 354.8 341.8 348.9
> kMV_N 74.5 100.5 147.7 161.1 43.8 54.3 88.1 98.5
> kMV_T 70.6 101.4 156.2 176.5 48.7 53.7 90.5 104.4
> kGER 39.5 46.6 81.7 86.8 20.4 26.9 40.4 52.8
> ---
>
> What we see is that, while clang seems to outperform GCC on level2 BLAS ops (matrix • vector), it is consistently 20 % inferior on level3 ops (lines 2, 3 and 4).
>
> The -O0 produces an oddity we are currently investigating, so I will not report the results now. -O3 figures are about the same. This is summarized in the following table, with both clang and gcc compilations done on the same MacBook machine previously mentioned (-O2 was used for gcc):
>
> single precision double precision
> ******************************** *******************************
> real complex real complex
> --------------- --------------- --------------- ---------------
> Benchmark Clang GCC4.5 Clang GCC4.5 Clang GCC4.5 Clang GCC4.5
> ========= ====== ====== ====== ===== ===== ===== ===== =====
> kSelMM 585.5 592.9 556.9 630.0 368.4 368.6 356.4 359.2
> kGenMM 154.4 180.4 153.1 183.4 147.7 165.6 159.4 172.5
> kMM_NT 146.3 165.7 146.0 165.4 132.3 138.9 139.9 161.7
> kMM_TN 150.9 181.2 149.9 180.8 116.3 168.1 134.2 165.8
> BIG_MM 572.7 589.4 581.7 550.2 354.7 353.9 347.8 347.1
> kMV_N 96.9 83.1 164.2 160.3 53.7 48.3 102.7 97.7
> kMV_T 96.3 75.2 173.1 166.2 53.2 55.1 98.2 97.9
> kGER 46.9 41.8 86.3 85.5 26.9 21.7 53.0 42.0
>
> Please note, and this is also important, that neither at -O0, nor at -O3 does clang seem to produce correct code: both version fails the ATLAS sanity checks, either with wrong results (at -O0) or with crashes at (-O3). -Oz does qualify, though.
>
> Vincent
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list