[LLVMdev] pb05 results for current llvm/dragonegg

Tue Apr 3 07:01:56 PDT 2012

On Tue, Apr 03, 2012 at 08:33:33AM -0500, Hal Finkel wrote:
> On Tue, 3 Apr 2012 08:57:51 -0400
> Jack Howarth <howarth at bromo.med.uc.edu> wrote:
> 
> > On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote:
> > > Hi Jack,
> > >
> > >>    Attached are the Polyhedron 2005 benchmark results for current
> > >> llvm/dragonegg svn on x86_64-apple-darwin11 built against Xcode
> > >> 4.3.2 and FSF gcc 4.6.3.
> > >
> > > thanks for the numbers.  How does this compare to LLVM 3.0 - were
> > > there any regressions?
> > 
> > The results from just before llvm/dragonegg 3.0 was released are at...
> > 
> > http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html
> > 
> > It does look as if the ac benchmark has been regressed from 10.80 sec
> > in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are
> > slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would
> > be shocked if that was the origin of the performance regression).
> >    The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't
> > seem much improved in llvm 3.1 so I assume this means little progress
> > was made in eliminating the scalarization of vectorizations in this
> > release. Did we even get any code added to llvm that would allow us
> > to identify instances of these scalarizations through a compiler
> > warning? Also, the current
> > -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do
> > almost nothing in terms of vectorization. Do we need to pass any
> > additional flags to actually achieve autovectorization via llvm 
> 
> Currently, we only have basic-block vectorization, so to get
> autovectorization of loops (which is probably what we want here), the
> loops need to be unrolled. I see that all categories include
> -funroll-loops, does that do anything if we're not using gcc's
> optimizations?
> 
> I generally run with both -unroll-allow-partial and -unroll-runtime so
> that llvm's unroller will do as much as it can. Also, in many of these
> cases, it looks like the vectorization is doing *something*, just not
> anything overly helpful ;) -vectorize is new, so it is helpful to
> get feedback on what is actually useful.
> 
> You might try including -bb-vectorize-aligned-only (sse3 does not
> actually have unaligned load/stores, right?). Other things to try
> include -bb-vectorize-no-ints (determining when to vectorize integer
> ops may be trickier than floating-point ops) and setting the required
> chain depth to something less than the current default of 6 (for
> example, -bb-vectorize-req-chain-depth=3) will cause a lot more
> vectorization.

So these need to be passed on their own instances of -fplugin-arg-dragonegg-llvm-option=
I guess. I'll try...

de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize -fplugin-arg-dragonegg-llvm-option=-unroll-allow-partial -fplugin-arg-dragonegg-llvm-option=-unroll-runtime -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints %n.f90 -o %n

Unfortunately it doesn't seem that dragonegg can currently parse something like...

-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3

% de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize -fplugin-arg-dragonegg-llvm-option=-unroll-allow-partial -fplugin-arg-dragonegg-llvm-option=-unroll-runtime -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 ac.f90 -o ac
f951: error: malformed option -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 (multiple '=' signs)

Duncan, any idea how to work around that for passing -bb-vectorize-req-chain-depth=3?
          Jack

> 
>  -Hal
> 
> (in
> > absence of -ftree-vectorize and
> > -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack
> > 
> > >
> > > Ciao, Duncan.
> > >
> > >  The benchmarks
> > >> for -msse3 and -msse4 appear identical (at least for degg+optnz).
> > >> This is fortunate since there seems to be a bug in -msse4 on 2.33
> > >> GHz (T7600) Intel Core 2 Duo Merom
> > >> (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack
> > >>
> > >> llvm/dragonegg r153877
> > >>
> > >> dragonegg:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
> > >>
> > >> degg+vectorize:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> > >> -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
> > >>
> > >> degg+optnz:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> > >> -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
> > >>
> > >> gfortran:
> > >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
> > >>
> > >> Ave Run (secs)
> > >>                 dragonegg degg+vectorize degg+optnz  gfortran
> > >> ac               12.45       12.45         8.85       8.80
> > >> aermod           16.15       16.05        14.80      17.48
> > >> air               7.10        7.11         6.46       5.50
> > >> capacita         40.00       39.96        37.72      32.62
> > >> channel           2.16        2.15         1.99       1.84
> > >> doduc            29.13       28.41        27.48      26.74
> > >> fatigue           8.75        9.03         8.11       8.44
> > >> gas_dyn          11.72       11.80         4.47       4.26
> > >> induct           24.02       24.91        12.08      13.65
> > >> linpk            15.40       15.78        15.74      15.45
> > >> mdbx             11.80       12.22        11.86      11.20
> > >> nf               28.45       28.50        29.25      27.91
> > >> protein          38.15       39.26        37.87      32.49
> > >> rnflow           32.25       32.35        26.47      24.06
> > >> test_fpu         11.34       11.35         9.31       8.04
> > >> tftt              1.91        1.92         1.93       1.87
> > >>
> > >> Geometric Mean   13.50       13.62        11.34      10.87
> > >>
> > >> Compile (secs)
> > >>                 dragonegg degg+vectorize degg+optnz  gfortran
> > >> ac                0.33        0.38         0.72       1.27
> > >> aermod           25.91       27.58        32.34      43.91
> > >> air               1.07        1.25         1.52       2.25
> > >> capacita          0.49        0.52         0.89       1.71
> > >> channel           0.29        0.36         0.50       0.62
> > >> doduc             1.71        4.50         3.25       5.34
> > >> fatigue           0.84        0.97         1.19       1.76
> > >> gas_dyn           0.67        0.68         1.20       3.02
> > >> induct            1.60        2.14         2.82       3.99
> > >> linpk             0.22        0.24         0.47       0.78
> > >> mdbx              0.63        0.77         1.16       1.85
> > >> nf                0.37        0.40         0.70       1.66
> > >> protein           0.93        1.02         1.75       4.01
> > >> rnflow            1.20        1.25         2.63       5.44
> > >> test_fpu          0.88        0.92         2.13       4.39
> > >> tftt              0.21        0.24         0.34       0.56
> > >>
> > >> Executable (bytes)
> > >>                 dragonegg degg+vectorize  degg+optnz  gfortran
> > >> ac                26856       26856        39120      50968
> > >> aermod          1043700     1055988      1046288    1265640
> > >> air               62004       62004        53740      73988
> > >> capacita          41416       41416        45552      73896
> > >> channel           22808       22808        26768      34784
> > >> doduc            128448      128448       136996     197240
> > >> fatigue           69824       69824        69840      86080
> > >> gas_dyn           59112       59112        67416     119744
> > >> induct           163152      167248       167344     174976
> > >> linpk             18752       18752        27056      38648
> > >> mdbx              53692       53692        57884      82112
> > >> nf                23960       23960        32104      71800
> > >> protein           75032       75032        87208     132040
> > >> rnflow            71896       71896        96632     181120
> > >> test_fpu          54272       54272        78776     155072
> > >> tftt              18640       18640        18488      30768
> > >>
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> -- 
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory