[cfe-dev] Clang build of ATLAS (and speed comparison)

Fri Sep 9 08:35:10 PDT 2011

Can you point us to the source of ATLAS? Does it have a web page?

Evan

On Sep 8, 2011, at 2:22 AM, Vincent Habchi wrote:

> Hi there,
> 
> together with Clint Whaley, the author and maintainer of the ATLAS suite, we are currently evaluating clang/llvm performance to support the clang compiler for ATLAS building, besides gcc, in the scope (on my side) to add this feature to the Macports port.
> 
> As a preliminary report, here is what spits out the ATLAS built-in benchmark (‘make time’) after a compilation with the -Oz flag. Reference stands for an installation presumably with GCC4.x on Linux (Clint, could you elaborate on this?). Machine is a 3-year old MacBook (late 2008, Core 2 Duo), compiler is the version of clang shipped with latest Xcode 4.2:
> 
>> Apple clang version 3.0 (tags/Apple/clang-211.9) (based on LLVM 3.0svn)
> 
> Dragonegg with gcc4.5 is used for fortran compilation, but I don’t think it is relevant here (again,  Clint, could you confirm this ?). Anyway, it is pure LLVM output code.
> 
> ---
> The times labeled Reference are for ATLAS as installed by the authors.
> NAMING ABBREVIATIONS:
>   kSelMM : selected matmul kernel (may be hand-tuned)
>   kGenMM : generated matmul kernel
>   kMM_NT : worst no-copy kernel
>   kMM_TN : best no-copy kernel
>   BIG_MM : large GEMM timing (usually N=1600); estimate of asymptotic peak
>   kMV_N  : NoTranspose matvec kernel
>   kMV_T  : Transpose matvec kernel
>   kGER   : GER (rank-1 update) kernel
> Kernel routines are not called by the user directly, and their
> performance is often somewhat different than the total
> algorithm (eg, dGER perf may differ from dkGER)
> 
> 
> Reference clock rate=2394Mhz, new rate=2400Mhz
>   Refrenc : % of clock rate achieved by reference install
>   Present : % of clock rate achieved by present ATLAS install
> 
>                    single precision                  double precision
>            ********************************   *******************************
>                  real           complex           real           complex
>            ---------------  ---------------  ---------------  ---------------
> Benchmark   Refrenc Present  Refrenc Present  Refrenc Present  Refrenc Present
> =========   ======= =======  ======= =======  ======= =======  ======= =======
>  kSelMM      646.0   608.2    611.2   571.5    369.1   368.5    355.4   356.5
>  kGenMM      185.1   154.5    183.0   153.1    173.7   147.6    176.1   151.1
>  kMM_NT      172.7   145.6    166.6   140.9    164.0   132.5    174.6   140.0
>  kMM_TN      185.8   150.8    178.6   148.5    174.1   114.3    167.9   132.5
>  BIG_MM      603.3   582.4    600.5   590.2    351.2   354.8    341.8   348.9
>   kMV_N       74.5   100.5    147.7   161.1     43.8    54.3     88.1    98.5
>   kMV_T       70.6   101.4    156.2   176.5     48.7    53.7     90.5   104.4
>    kGER       39.5    46.6     81.7    86.8     20.4    26.9     40.4    52.8
> ---
> 
> What we see is that, while clang seems to outperform GCC on level2 BLAS ops (matrix • vector), it is consistently 20 % inferior on level3 ops (lines 2, 3 and 4). 
> 
> The -O0 produces an oddity we are currently investigating, so I will not report the results now. -O3 figures are about the same. This is summarized in the following table, with both clang and gcc compilations done on the same MacBook machine previously mentioned (-O2 was used for gcc):
> 
>                  single precision                  double precision
>          ********************************   *******************************
>                real           complex           real           complex
>          ---------------  ---------------  ---------------  ---------------
> Benchmark   Clang   GCC4.5   Clang   GCC4.5   Clang   GCC4.5   Clang   GCC4.5
> =========   ======  ======  ======   =====    =====   =====    =====   =====
> kSelMM      585.5   592.9    556.9   630.0    368.4   368.6    356.4   359.2
> kGenMM      154.4   180.4    153.1   183.4    147.7   165.6    159.4   172.5
> kMM_NT      146.3   165.7    146.0   165.4    132.3   138.9    139.9   161.7
> kMM_TN      150.9   181.2    149.9   180.8    116.3   168.1    134.2   165.8
> BIG_MM      572.7   589.4    581.7   550.2    354.7   353.9    347.8   347.1
> kMV_N       96.9    83.1    164.2   160.3     53.7    48.3    102.7    97.7
> kMV_T       96.3    75.2    173.1   166.2     53.2    55.1     98.2    97.9
>  kGER       46.9    41.8     86.3    85.5     26.9    21.7     53.0    42.0
> 
> Please note, and this is also important, that neither at -O0, nor at -O3 does clang seem to produce correct code: both version fails the ATLAS sanity checks, either with wrong results (at -O0) or with crashes at (-O3). -Oz does qualify, though.
> 
> Vincent
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev