[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn
Nadav Rotem
nrotem at apple.com
Sun Jun 2 10:08:16 PDT 2013
Jack,
Can you please file a bug report and attach the BC files for the major loops that we miss ?
Thanks,
Nadav
On Jun 2, 2013, at 1:27, Duncan Sands <duncan.sands at gmail.com> wrote:
> Hi Jack, thanks for splitting out what the effects of LLVM's / GCC's vectorizers
> is.
>
> On 01/06/13 21:34, Jack Howarth wrote:
>> On Sat, Jun 01, 2013 at 06:45:48AM +0200, Duncan Sands wrote:
>>>
>>> These results are very disappointing, I was hoping to see a big improvement
>>> somewhere instead of no real improvement anywhere (except for gas_dyn) or a
>>> regression (eg: mdbx). I think LLVM now has a reasonable array of fast-math
>>> optimizations. I will try to find time to poke at gas_dyn and induct: since
>>> turning on gcc's optimizations there halve the run-time, LLVM's IR optimizers
>>> are clearly missing something important.
>>>
>>> Ciao, Duncan.
>>
>> Duncan,
>> Appended are another set of benchmark runs where I attempted to decouple the
>> fast math optimizations from the vectorization by passing -fno-tree-vectorize.
>> I am unclear if dragonegg really honors -fno-tree-vectorize to disable the llvm
>> vectorization.
>
> Yes, it does disable LLVM vectorization.
>
>>
>> Tested on x86_apple-darwin12
>>
>> Compile Flags: -ffast-math -funroll-loops -O3 -fno-tree-vectorize
>
> Maybe -march=native would be a good addition.
>
>>
>> de-gfc48: /sw/lib/gcc4.8/bin/gfortran -fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integrated-as.specs
>> de-gfc48+optzns: /sw/lib/gcc4.8/bin/gfortran -fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integrated-as.spec
>> s -fplugin-arg-dragonegg-enable-gcc-optzns
>> gfortran48: /sw/bin/gfortran-fsf-4.8
>>
>> Run time (secs)
>
> What is the standard deviation for each benchmark? If each run varies by +-5%
> then that means that the changes in runtime of around 3% measured below don't
> mean anything.
>
>
> Comparing with your previous benchmarks, I see:
>
>>
>> Benchmark de-gfc48 de-gfc48 gfortran48
>> +optzns
>>
>> ac 11.33 8.10 8.02
>
> Turning on LLVM's vectorizer gives a 2% slowdown.
>
>> aermod 16.03 14.45 16.13
>
> Turning on LLVM's vectorizer gives a 2.5% slowdown.
>
>> air 6.80 5.28 5.73
>> capacita 39.89 35.21 34.96
>
> Turning on LLVM's vectorizer gives a 5% speedup. GCC gets a 5.5% speedup from
> its vectorizer.
>
>> channel 2.06 2.29 2.69
>
> GCC's gets a 30% speedup from its vectorizer which LLVM doesn't get. On the
> other hand, without vectorization LLVM's version runs 23% faster than GCC's, so
> while GCC's vectorizer leaps GCC into the lead, the final speed difference is
> more in the order of GCC 10% faster.
>
>> doduc 27.35 26.13 25.74
>> fatigue 8.83 4.82 4.67
>
> GCC's gets a 17% speedup from its vectorizer which LLVM doesn't get.
> This is a good one to look at, because all the difference between GCC
> and LLVM is coming from the mid-level optimizers: turning on GCC optzns
> in dragonegg speeds up the program to GCC levels, so it is possible to
> get LLVM IR with and without the effect of GCC optimizations, which should
> make it fairly easy to understand what GCC is doing right here.
>
>> gas_dyn 11.41 9.79 9.60
>
> Turning on LLVM's vectorizer gives a 30% speedup. GCC gets a comparable
> speedup from its vectorizer.
>
>> induct 23.95 21.75 21.14
>
> GCC's gets a 40% speedup from its vectorizer which LLVM doesn't get. Like
> fatigue, this is a case where we can get IR showing all the improvements that
> the GCC optimizers made.
>
>> linpk 15.49 15.48 15.69
>> mdbx 11.91 11.28 11.39
>
> Turning on LLVM's vectorizer gives a 2% slowdown
>
>> nf 29.92 29.57 27.99
>> protein 36.34 33.94 31.91
>
> Turning on LLVM's vectorizer gives a 3% speedup.
>
>> rnflow 25.97 25.27 22.78
>
> GCC's gets a 7% speedup from its vectorizer which LLVM doesn't get.
>
>> test_fpu 11.48 10.91 9.64
>
> GCC's gets a 17% speedup from its vectorizer which LLVM doesn't get.
>
>> tfft 1.92 1.91 1.91
>>
>> Geom. Mean 13.12 11.70 11.64
>
> Ciao, Duncan.
>
>>
>> Assuming that the de-gfc48+optzns run really has disabled the llvm vectorization,
>> I am hoping that additional benchmarking of de-gfc48+optzns with individual
>> -ffast-math optimizations disabled (such as passing -fno-unsafe-math-optimizations)
>> may give us a clue as the the origin of the performance delta between the stock
>> dragonegg results with -ffast-math and those with -fplugin-arg-dragonegg-enable-gcc-optzns.
>> Jack
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130602/dc864642/attachment.html>
More information about the llvm-dev
mailing list