[cfe-dev] More on atlas and clang

Vincent Habchi vince at macports.org
Sun Mar 10 07:52:26 PDT 2013


Hi there,

I have recently undertaken another experimental build of Atlas (http://math-atlas.sourceforge.net – briefly speaking, Atlas provides a highly complete BLAS/LAPACK implementation optimized for the native architecture of the computer on which it is running) on an AVX machine (MacMini 2011) using a snapshot of clang 3.3 (r173279) provided by MacPorts (http://macports.org), with -O3, -fPIC, -fvectorize and -fslp-vectorize flags. 

I am please to say that:

1. The generated AVX code seems fine: a full test session run under an Atlas-based SciPy didn’t raise any error;
2. The performance seems now on-par or even (sometimes surprisingly) better than the ‘reference GCC’ – whatever that means (I was unable to get in touch with Atlas developer at that time) – as evidenced by the table below:

Reference clock rate=3292Mhz, new rate=2300Mhz
  Refrenc : % of clock rate achieved by reference install
  Present : % of clock rate achieved by present ATLAS install

                   single precision                  double precision
           ********************************   *******************************
                 real           complex           real           complex
           ---------------  ---------------  ---------------  ---------------
Benchmark   Refrenc Present  Refrenc Present  Refrenc Present  Refrenc Present
=========   ======= =======  ======= =======  ======= =======  ======= =======
 kSelMM     1289.9  1407.4   1188.7  1229.8    686.7   826.8    647.4   682.1
 kGenMM      198.2   239.7    198.5   237.8    193.9   231.8    196.0   233.8
 kMM_NT      193.7   266.4    195.2   192.9    184.2   187.4    188.5   197.5
 kMM_TN      198.5   211.1    197.9   226.2    189.8   227.6    189.5   223.2
 BIG_MM     1213.8  1346.7   1241.3  1366.5    652.0   789.5    661.4   795.8
  kMV_N      224.3   308.1    438.8   617.3    115.9   152.1    205.8   283.5
  kMV_T      224.6   313.5    460.3   642.9    123.2   159.6    211.3   288.2
   kGER      148.3   192.4    290.2   381.2     73.3    95.6    144.3   184.3

This is in stark contrast with the previous test where clang were lagging about 20% beyond the ‘reference implementation’ based on GCC for lines 2, 3 and 4 where compiler performance matters most.

So – to summarize in two words: kudos folks!

I will build another version on a Core2Duo machine tonight and see if the results are consistent.

Cheers!
Vincent





More information about the cfe-dev mailing list