[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

Sun Jun 2 10:08:16 PDT 2013

Jack,

Can you please file a bug report and attach the BC files for the major loops that we miss ? 

Thanks,
Nadav

On Jun 2, 2013, at 1:27, Duncan Sands <duncan.sands at gmail.com> wrote:

> Hi Jack, thanks for splitting out what the effects of LLVM's / GCC's vectorizers
> is.
> 
> On 01/06/13 21:34, Jack Howarth wrote:
>> On Sat, Jun 01, 2013 at 06:45:48AM +0200, Duncan Sands wrote:
>>> 
>>> These results are very disappointing, I was hoping to see a big improvement
>>> somewhere instead of no real improvement anywhere (except for gas_dyn) or a
>>> regression (eg: mdbx).  I think LLVM now has a reasonable array of fast-math
>>> optimizations.  I will try to find time to poke at gas_dyn and induct: since
>>> turning on gcc's optimizations there halve the run-time, LLVM's IR optimizers
>>> are clearly missing something important.
>>> 
>>> Ciao, Duncan.
>> 
>> Duncan,
>>    Appended are another set of benchmark runs where I attempted to decouple the
>> fast math optimizations from the vectorization by passing -fno-tree-vectorize.
>> I am unclear if dragonegg really honors -fno-tree-vectorize to disable the llvm
>> vectorization.
> 
> Yes, it does disable LLVM vectorization.
> 
>> 
>> Tested on x86_apple-darwin12
>> 
>> Compile Flags: -ffast-math -funroll-loops -O3 -fno-tree-vectorize
> 
> Maybe -march=native would be a good addition.
> 
>> 
>> de-gfc48: /sw/lib/gcc4.8/bin/gfortran -fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integrated-as.specs
>> de-gfc48+optzns: /sw/lib/gcc4.8/bin/gfortran -fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integrated-as.spec
>> s -fplugin-arg-dragonegg-enable-gcc-optzns
>> gfortran48: /sw/bin/gfortran-fsf-4.8
>> 
>> Run time (secs)
> 
> What is the standard deviation for each benchmark?  If each run varies by +-5%
> then that means that the changes in runtime of around 3% measured below don't
> mean anything.
> 
> 
> Comparing with your previous benchmarks, I see:
> 
>> 
>> Benchmark     de-gfc48  de-gfc48   gfortran48
>>                         +optzns
>> 
>> ac             11.33      8.10       8.02
> 
> Turning on LLVM's vectorizer gives a 2% slowdown.
> 
>> aermod         16.03     14.45      16.13
> 
> Turning on LLVM's vectorizer gives a 2.5% slowdown.
> 
>> air             6.80      5.28       5.73
>> capacita       39.89     35.21      34.96
> 
> Turning on LLVM's vectorizer gives a 5% speedup.  GCC gets a 5.5% speedup from
> its vectorizer.
> 
>> channel         2.06      2.29       2.69
> 
> GCC's gets a 30% speedup from its vectorizer which LLVM doesn't get.  On the
> other hand, without vectorization LLVM's version runs 23% faster than GCC's, so
> while GCC's vectorizer leaps GCC into the lead, the final speed difference is
> more in the order of GCC 10% faster.
> 
>> doduc          27.35     26.13      25.74
>> fatigue         8.83      4.82       4.67
> 
> GCC's gets a 17% speedup from its vectorizer which LLVM doesn't get.
> This is a good one to look at, because all the difference between GCC
> and LLVM is coming from the mid-level optimizers: turning on GCC optzns
> in dragonegg speeds up the program to GCC levels, so it is possible to
> get LLVM IR with and without the effect of GCC optimizations, which should
> make it fairly easy to understand what GCC is doing right here.
> 
>> gas_dyn        11.41      9.79       9.60
> 
> Turning on LLVM's vectorizer gives a 30% speedup.  GCC gets a comparable
> speedup from its vectorizer.
> 
>> induct         23.95     21.75      21.14
> 
> GCC's gets a 40% speedup from its vectorizer which LLVM doesn't get.  Like
> fatigue, this is a case where we can get IR showing all the improvements that
> the GCC optimizers made.
> 
>> linpk          15.49     15.48      15.69
>> mdbx           11.91     11.28      11.39
> 
> Turning on LLVM's vectorizer gives a 2% slowdown
> 
>> nf             29.92     29.57      27.99
>> protein        36.34     33.94      31.91
> 
> Turning on LLVM's vectorizer gives a 3% speedup.
> 
>> rnflow         25.97     25.27      22.78
> 
> GCC's gets a 7% speedup from its vectorizer which LLVM doesn't get.
> 
>> test_fpu       11.48     10.91       9.64
> 
> GCC's gets a 17% speedup from its vectorizer which LLVM doesn't get.
> 
>> tfft            1.92      1.91       1.91
>> 
>> Geom. Mean     13.12     11.70      11.64
> 
> Ciao, Duncan.
> 
>> 
>> Assuming that the de-gfc48+optzns run really has disabled the llvm vectorization,
>> I am hoping that additional benchmarking of de-gfc48+optzns with individual
>> -ffast-math optimizations disabled (such as passing -fno-unsafe-math-optimizations)
>> may give us a clue as the the origin of the performance delta between the stock
>> dragonegg results with -ffast-math and those with -fplugin-arg-dragonegg-enable-gcc-optzns.
>>       Jack
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130602/dc864642/attachment.html>