[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

Sun Jun 2 01:27:16 PDT 2013

Hi Jack, thanks for splitting out what the effects of LLVM's / GCC's vectorizers
is.

On 01/06/13 21:34, Jack Howarth wrote:
> On Sat, Jun 01, 2013 at 06:45:48AM +0200, Duncan Sands wrote:
>>
>> These results are very disappointing, I was hoping to see a big improvement
>> somewhere instead of no real improvement anywhere (except for gas_dyn) or a
>> regression (eg: mdbx).  I think LLVM now has a reasonable array of fast-math
>> optimizations.  I will try to find time to poke at gas_dyn and induct: since
>> turning on gcc's optimizations there halve the run-time, LLVM's IR optimizers
>> are clearly missing something important.
>>
>> Ciao, Duncan.
>
> Duncan,
>     Appended are another set of benchmark runs where I attempted to decouple the
> fast math optimizations from the vectorization by passing -fno-tree-vectorize.
> I am unclear if dragonegg really honors -fno-tree-vectorize to disable the llvm
> vectorization.

Yes, it does disable LLVM vectorization.

>
> Tested on x86_apple-darwin12
>
> Compile Flags: -ffast-math -funroll-loops -O3 -fno-tree-vectorize

Maybe -march=native would be a good addition.

>
> de-gfc48: /sw/lib/gcc4.8/bin/gfortran -fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integrated-as.specs
> de-gfc48+optzns: /sw/lib/gcc4.8/bin/gfortran -fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integrated-as.spec
> s -fplugin-arg-dragonegg-enable-gcc-optzns
> gfortran48: /sw/bin/gfortran-fsf-4.8
>
> Run time (secs)

What is the standard deviation for each benchmark?  If each run varies by +-5%
then that means that the changes in runtime of around 3% measured below don't
mean anything.

Comparing with your previous benchmarks, I see:

>
> Benchmark     de-gfc48  de-gfc48   gfortran48
>                          +optzns
>
> ac             11.33      8.10       8.02

Turning on LLVM's vectorizer gives a 2% slowdown.

> aermod         16.03     14.45      16.13

Turning on LLVM's vectorizer gives a 2.5% slowdown.

> air             6.80      5.28       5.73
> capacita       39.89     35.21      34.96

Turning on LLVM's vectorizer gives a 5% speedup.  GCC gets a 5.5% speedup from
its vectorizer.

> channel         2.06      2.29       2.69

GCC's gets a 30% speedup from its vectorizer which LLVM doesn't get.  On the
other hand, without vectorization LLVM's version runs 23% faster than GCC's, so
while GCC's vectorizer leaps GCC into the lead, the final speed difference is
more in the order of GCC 10% faster.

> doduc          27.35     26.13      25.74
> fatigue         8.83      4.82       4.67

GCC's gets a 17% speedup from its vectorizer which LLVM doesn't get.
This is a good one to look at, because all the difference between GCC
and LLVM is coming from the mid-level optimizers: turning on GCC optzns
in dragonegg speeds up the program to GCC levels, so it is possible to
get LLVM IR with and without the effect of GCC optimizations, which should
make it fairly easy to understand what GCC is doing right here.

> gas_dyn        11.41      9.79       9.60

Turning on LLVM's vectorizer gives a 30% speedup.  GCC gets a comparable
speedup from its vectorizer.

> induct         23.95     21.75      21.14

GCC's gets a 40% speedup from its vectorizer which LLVM doesn't get.  Like
fatigue, this is a case where we can get IR showing all the improvements that
the GCC optimizers made.

> linpk          15.49     15.48      15.69
> mdbx           11.91     11.28      11.39

Turning on LLVM's vectorizer gives a 2% slowdown

> nf             29.92     29.57      27.99
> protein        36.34     33.94      31.91

Turning on LLVM's vectorizer gives a 3% speedup.

> rnflow         25.97     25.27      22.78

GCC's gets a 7% speedup from its vectorizer which LLVM doesn't get.

> test_fpu       11.48     10.91       9.64

GCC's gets a 17% speedup from its vectorizer which LLVM doesn't get.

> tfft            1.92      1.91       1.91
>
> Geom. Mean     13.12     11.70      11.64

Ciao, Duncan.

>
> Assuming that the de-gfc48+optzns run really has disabled the llvm vectorization,
> I am hoping that additional benchmarking of de-gfc48+optzns with individual
> -ffast-math optimizations disabled (such as passing -fno-unsafe-math-optimizations)
> may give us a clue as the the origin of the performance delta between the stock
> dragonegg results with -ffast-math and those with -fplugin-arg-dragonegg-enable-gcc-optzns.
>        Jack
>