[LLVMdev] pb05 results for current llvm/dragonegg
Jack Howarth
howarth at bromo.med.uc.edu
Tue Apr 3 13:50:59 PDT 2012
Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg svn
on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3. The benchmarks
for -msse3 and -msse4 appear identical (at least for degg+optnz). This is fortunate
since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo Merom
(http://llvm.org/bugs/show_bug.cgi?id=12434). I've added two additional entries to
the table. The first, degg+novect+optnz, should show the optimizations achieved by
-fplugin-arg-dragonegg-enable-gcc-optzns in the absence of autovectorization by
FSF gcc. This shows the missing optimization opportunities for LLVM IR-level outside
of autovectorization. The second entry is for the new LLVM autovectorization option
with all of its related options set. This shows mixed results with some benchmarks
being improved over the simple -fplugin-arg-dragonegg-llvm-option=-vectorize
and some being worsened in performance.
Jack
llvm/dragonegg r153877
dragonegg:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
degg+vectorize:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
degg+optnz:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
gfortran:
gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
degg+novect+optnz
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
degg+fullvect+optnz
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-llvm-option=-vectorize -fplugin-arg-dragonegg-llvm-option=-unroll-allow-partia
l -fplugin-arg-dragonegg-llvm-option=-unroll-runtime -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints %
n.f90 -o %n
Ave Run (secs)
dragonegg degg+vectorize degg+optnz gfortran degg+novect+optnz degg+fullvect+optnz
ac 12.45 12.45 8.85 8.80 8.90 10.89
aermod 16.15 16.05 14.80 17.48 14.12 15.84
air 7.10 7.11 6.46 5.50 6.46 8.15
capacita 40.00 39.96 37.72 32.62 39.38 39.94
channel 2.16 2.15 1.99 1.84 2.15 2.56
doduc 29.13 28.41 27.48 26.74 28.27 29.05
fatigue 8.75 9.03 8.11 8.44 7.28 10.49
gas_dyn 11.72 11.80 4.47 4.26 10.02 11.63
induct 24.02 24.91 12.08 13.65 20.54 24.68
linpk 15.40 15.78 15.74 15.45 15.39 15.46
mdbx 11.80 12.22 11.86 11.20 11.82 11.50
nf 28.45 28.50 29.25 27.91 29.17 28.16
protein 38.15 39.26 37.87 32.49 39.08 38.62
rnflow 32.25 32.35 26.47 24.06 28.75 31.05
test_fpu 11.34 11.35 9.31 8.04 10.88 10.19
tftt 1.91 1.92 1.93 1.87 1.94 1.90
Geometric Mean 13.50 13.62 11.34 10.87 12.53 13.65
Compile (secs)
dragonegg degg+vectorize degg+optnz gfortran degg+novect+optnz degg+fullvect+optnz
ac 0.33 0.38 0.72 1.27 0.71 0.39
aermod 25.91 27.58 32.34 43.91 25.13 23.62
air 1.07 1.25 1.52 2.25 1.36 1.34
capacita 0.49 0.52 0.89 1.71 0.71 0.98
channel 0.29 0.36 0.50 0.62 0.42 0.49
doduc 1.71 4.50 3.25 5.34 2.75 5.42
fatigue 0.84 0.97 1.19 1.76 1.00 1.24
gas_dyn 0.67 0.68 1.20 3.02 0.90 1.81
induct 1.60 2.14 2.82 3.99 2.53 2.15
linpk 0.22 0.24 0.47 0.78 0.30 0.46
mdbx 0.63 0.77 1.16 1.85 0.99 1.12
nf 0.37 0.40 0.70 1.66 0.42 1.22
protein 0.93 1.02 1.75 4.01 1.40 2.73
rnflow 1.20 1.25 2.63 5.44 1.72 2.85
test_fpu 0.88 0.92 2.13 4.39 1.26 2.38
tftt 0.21 0.24 0.34 0.56 0.30 0.27
Executable (bytes)
dragonegg degg+vectorize degg+optnz gfortran degg+novect+optnz degg+fullvect+optnz
ac 26856 26856 39120 50968 39120 35144
aermod 1043700 1055988 1046288 1265640 1013488 1146196
air 62004 62004 53740 73988 53740 78392
capacita 41416 41416 45552 73896 41416 70096
channel 22808 22808 26768 34784 22672 34984
doduc 128448 128448 136996 197240 128868 173512
fatigue 69824 69824 69840 86080 65712 78016
gas_dyn 59112 59112 67416 119744 59160 91952
induct 163152 167248 167344 174976 176696 179552
linpk 18752 18752 27056 38648 18904 31200
mdbx 53692 53692 57884 82112 53788 70080
nf 23960 23960 32104 71800 23912 48568
protein 75032 75032 87208 132040 78912 132376
rnflow 71896 71896 96632 181120 67928 137528
test_fpu 54272 54272 78776 155072 50144 111640
tftt 18640 18640 18488 30768 18488 22744
More information about the llvm-dev
mailing list