[LLVMdev] pb05 results for current llvm/dragonegg

Tue Apr 3 06:33:33 PDT 2012

On Tue, 3 Apr 2012 08:57:51 -0400
Jack Howarth <howarth at bromo.med.uc.edu> wrote:

> On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote:
> > Hi Jack,
> >
> >>    Attached are the Polyhedron 2005 benchmark results for current
> >> llvm/dragonegg svn on x86_64-apple-darwin11 built against Xcode
> >> 4.3.2 and FSF gcc 4.6.3.
> >
> > thanks for the numbers.  How does this compare to LLVM 3.0 - were
> > there any regressions?
> 
> The results from just before llvm/dragonegg 3.0 was released are at...
> 
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html
> 
> It does look as if the ac benchmark has been regressed from 10.80 sec
> in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are
> slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would
> be shocked if that was the origin of the performance regression).
>    The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't
> seem much improved in llvm 3.1 so I assume this means little progress
> was made in eliminating the scalarization of vectorizations in this
> release. Did we even get any code added to llvm that would allow us
> to identify instances of these scalarizations through a compiler
> warning? Also, the current
> -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do
> almost nothing in terms of vectorization. Do we need to pass any
> additional flags to actually achieve autovectorization via llvm 

Currently, we only have basic-block vectorization, so to get
autovectorization of loops (which is probably what we want here), the
loops need to be unrolled. I see that all categories include
-funroll-loops, does that do anything if we're not using gcc's
optimizations?

I generally run with both -unroll-allow-partial and -unroll-runtime so
that llvm's unroller will do as much as it can. Also, in many of these
cases, it looks like the vectorization is doing *something*, just not
anything overly helpful ;) -vectorize is new, so it is helpful to
get feedback on what is actually useful.

You might try including -bb-vectorize-aligned-only (sse3 does not
actually have unaligned load/stores, right?). Other things to try
include -bb-vectorize-no-ints (determining when to vectorize integer
ops may be trickier than floating-point ops) and setting the required
chain depth to something less than the current default of 6 (for
example, -bb-vectorize-req-chain-depth=3) will cause a lot more
vectorization.

 -Hal

(in
> absence of -ftree-vectorize and
> -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack
> 
> >
> > Ciao, Duncan.
> >
> >  The benchmarks
> >> for -msse3 and -msse4 appear identical (at least for degg+optnz).
> >> This is fortunate since there seems to be a bug in -msse4 on 2.33
> >> GHz (T7600) Intel Core 2 Duo Merom
> >> (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack
> >>
> >> llvm/dragonegg r153877
> >>
> >> dragonegg:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
> >>
> >> degg+vectorize:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> >> -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
> >>
> >> degg+optnz:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> >> -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
> >>
> >> gfortran:
> >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
> >>
> >> Ave Run (secs)
> >>                 dragonegg degg+vectorize degg+optnz  gfortran
> >> ac               12.45       12.45         8.85       8.80
> >> aermod           16.15       16.05        14.80      17.48
> >> air               7.10        7.11         6.46       5.50
> >> capacita         40.00       39.96        37.72      32.62
> >> channel           2.16        2.15         1.99       1.84
> >> doduc            29.13       28.41        27.48      26.74
> >> fatigue           8.75        9.03         8.11       8.44
> >> gas_dyn          11.72       11.80         4.47       4.26
> >> induct           24.02       24.91        12.08      13.65
> >> linpk            15.40       15.78        15.74      15.45
> >> mdbx             11.80       12.22        11.86      11.20
> >> nf               28.45       28.50        29.25      27.91
> >> protein          38.15       39.26        37.87      32.49
> >> rnflow           32.25       32.35        26.47      24.06
> >> test_fpu         11.34       11.35         9.31       8.04
> >> tftt              1.91        1.92         1.93       1.87
> >>
> >> Geometric Mean   13.50       13.62        11.34      10.87
> >>
> >> Compile (secs)
> >>                 dragonegg degg+vectorize degg+optnz  gfortran
> >> ac                0.33        0.38         0.72       1.27
> >> aermod           25.91       27.58        32.34      43.91
> >> air               1.07        1.25         1.52       2.25
> >> capacita          0.49        0.52         0.89       1.71
> >> channel           0.29        0.36         0.50       0.62
> >> doduc             1.71        4.50         3.25       5.34
> >> fatigue           0.84        0.97         1.19       1.76
> >> gas_dyn           0.67        0.68         1.20       3.02
> >> induct            1.60        2.14         2.82       3.99
> >> linpk             0.22        0.24         0.47       0.78
> >> mdbx              0.63        0.77         1.16       1.85
> >> nf                0.37        0.40         0.70       1.66
> >> protein           0.93        1.02         1.75       4.01
> >> rnflow            1.20        1.25         2.63       5.44
> >> test_fpu          0.88        0.92         2.13       4.39
> >> tftt              0.21        0.24         0.34       0.56
> >>
> >> Executable (bytes)
> >>                 dragonegg degg+vectorize  degg+optnz  gfortran
> >> ac                26856       26856        39120      50968
> >> aermod          1043700     1055988      1046288    1265640
> >> air               62004       62004        53740      73988
> >> capacita          41416       41416        45552      73896
> >> channel           22808       22808        26768      34784
> >> doduc            128448      128448       136996     197240
> >> fatigue           69824       69824        69840      86080
> >> gas_dyn           59112       59112        67416     119744
> >> induct           163152      167248       167344     174976
> >> linpk             18752       18752        27056      38648
> >> mdbx              53692       53692        57884      82112
> >> nf                23960       23960        32104      71800
> >> protein           75032       75032        87208     132040
> >> rnflow            71896       71896        96632     181120
> >> test_fpu          54272       54272        78776     155072
> >> tftt              18640       18640        18488      30768
> >>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory