[LLVMdev] pb05 results for current llvm/dragonegg
Hal Finkel
hfinkel at anl.gov
Tue Apr 3 06:33:33 PDT 2012
On Tue, 3 Apr 2012 08:57:51 -0400
Jack Howarth <howarth at bromo.med.uc.edu> wrote:
> On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote:
> > Hi Jack,
> >
> >> Attached are the Polyhedron 2005 benchmark results for current
> >> llvm/dragonegg svn on x86_64-apple-darwin11 built against Xcode
> >> 4.3.2 and FSF gcc 4.6.3.
> >
> > thanks for the numbers. How does this compare to LLVM 3.0 - were
> > there any regressions?
>
> The results from just before llvm/dragonegg 3.0 was released are at...
>
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html
>
> It does look as if the ac benchmark has been regressed from 10.80 sec
> in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are
> slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would
> be shocked if that was the origin of the performance regression).
> The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't
> seem much improved in llvm 3.1 so I assume this means little progress
> was made in eliminating the scalarization of vectorizations in this
> release. Did we even get any code added to llvm that would allow us
> to identify instances of these scalarizations through a compiler
> warning? Also, the current
> -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do
> almost nothing in terms of vectorization. Do we need to pass any
> additional flags to actually achieve autovectorization via llvm
Currently, we only have basic-block vectorization, so to get
autovectorization of loops (which is probably what we want here), the
loops need to be unrolled. I see that all categories include
-funroll-loops, does that do anything if we're not using gcc's
optimizations?
I generally run with both -unroll-allow-partial and -unroll-runtime so
that llvm's unroller will do as much as it can. Also, in many of these
cases, it looks like the vectorization is doing *something*, just not
anything overly helpful ;) -vectorize is new, so it is helpful to
get feedback on what is actually useful.
You might try including -bb-vectorize-aligned-only (sse3 does not
actually have unaligned load/stores, right?). Other things to try
include -bb-vectorize-no-ints (determining when to vectorize integer
ops may be trickier than floating-point ops) and setting the required
chain depth to something less than the current default of 6 (for
example, -bb-vectorize-req-chain-depth=3) will cause a lot more
vectorization.
-Hal
(in
> absence of -ftree-vectorize and
> -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack
>
> >
> > Ciao, Duncan.
> >
> > The benchmarks
> >> for -msse3 and -msse4 appear identical (at least for degg+optnz).
> >> This is fortunate since there seems to be a bug in -msse4 on 2.33
> >> GHz (T7600) Intel Core 2 Duo Merom
> >> (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack
> >>
> >> llvm/dragonegg r153877
> >>
> >> dragonegg:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
> >>
> >> degg+vectorize:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> >> -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
> >>
> >> degg+optnz:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> >> -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
> >>
> >> gfortran:
> >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
> >>
> >> Ave Run (secs)
> >> dragonegg degg+vectorize degg+optnz gfortran
> >> ac 12.45 12.45 8.85 8.80
> >> aermod 16.15 16.05 14.80 17.48
> >> air 7.10 7.11 6.46 5.50
> >> capacita 40.00 39.96 37.72 32.62
> >> channel 2.16 2.15 1.99 1.84
> >> doduc 29.13 28.41 27.48 26.74
> >> fatigue 8.75 9.03 8.11 8.44
> >> gas_dyn 11.72 11.80 4.47 4.26
> >> induct 24.02 24.91 12.08 13.65
> >> linpk 15.40 15.78 15.74 15.45
> >> mdbx 11.80 12.22 11.86 11.20
> >> nf 28.45 28.50 29.25 27.91
> >> protein 38.15 39.26 37.87 32.49
> >> rnflow 32.25 32.35 26.47 24.06
> >> test_fpu 11.34 11.35 9.31 8.04
> >> tftt 1.91 1.92 1.93 1.87
> >>
> >> Geometric Mean 13.50 13.62 11.34 10.87
> >>
> >> Compile (secs)
> >> dragonegg degg+vectorize degg+optnz gfortran
> >> ac 0.33 0.38 0.72 1.27
> >> aermod 25.91 27.58 32.34 43.91
> >> air 1.07 1.25 1.52 2.25
> >> capacita 0.49 0.52 0.89 1.71
> >> channel 0.29 0.36 0.50 0.62
> >> doduc 1.71 4.50 3.25 5.34
> >> fatigue 0.84 0.97 1.19 1.76
> >> gas_dyn 0.67 0.68 1.20 3.02
> >> induct 1.60 2.14 2.82 3.99
> >> linpk 0.22 0.24 0.47 0.78
> >> mdbx 0.63 0.77 1.16 1.85
> >> nf 0.37 0.40 0.70 1.66
> >> protein 0.93 1.02 1.75 4.01
> >> rnflow 1.20 1.25 2.63 5.44
> >> test_fpu 0.88 0.92 2.13 4.39
> >> tftt 0.21 0.24 0.34 0.56
> >>
> >> Executable (bytes)
> >> dragonegg degg+vectorize degg+optnz gfortran
> >> ac 26856 26856 39120 50968
> >> aermod 1043700 1055988 1046288 1265640
> >> air 62004 62004 53740 73988
> >> capacita 41416 41416 45552 73896
> >> channel 22808 22808 26768 34784
> >> doduc 128448 128448 136996 197240
> >> fatigue 69824 69824 69840 86080
> >> gas_dyn 59112 59112 67416 119744
> >> induct 163152 167248 167344 174976
> >> linpk 18752 18752 27056 38648
> >> mdbx 53692 53692 57884 82112
> >> nf 23960 23960 32104 71800
> >> protein 75032 75032 87208 132040
> >> rnflow 71896 71896 96632 181120
> >> test_fpu 54272 54272 78776 155072
> >> tftt 18640 18640 18488 30768
> >>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list