[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status

Thu Jun 9 06:44:40 PDT 2011

Hi Jack, thanks for doing this.

>      Below are the tabulated compile times and executable sizes.
>
> A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
> B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
> C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize

These numbers really surprised me: the GCC code generators must be really slow
if the entire set of LLVM IR and codegen optimizations takes less time to run
than GCC codegen (since with -fplugin-arg-dragonegg-enable-gcc-optzns the only
part of GCC being disabled is codegen, i.e. RTL).  I was assuming that I would
need to reduce the LLVM optimization level to get decent speed.  Are you sure
that you built GCC with checking disabled (or --enable-checking=release)?
Can you please also redo this (along with execution times), adding the option
-fplugin-arg-dragonegg-llvm-ir-optimize=2.  I expect that to always result in
a decent compile time win for dragonegg wrt stock gcc-4.5.  If it doesn't have
a significant impact on execution speed, then I'd be tempted to use the formula
   LLVM optimization level = (1 + GCC optimization level) / 2
as the default, i.e. GCC -O3 -> LLVM -O2, GCC -O2 -> LLVM -O1, GCC -O1 -> LLVM
-O1, GCC -O0 -> LLVM -O0, GCC -O5 -> LLVM -O3.

Best wishes, Duncan.

>
> Compile time (seconds)
>
> Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
>                 gcc 4.5.4   dragonegg/optzns    dragonegg
>
> ac                0.61        1.65           0.32
> aermod           31.24       25.83          21.02
> air               1.74        1.49           0.81
> capacita          0.83        0.80           0.44
> channel           0.34        0.33           0.25
> doduc             3.09        2.63           1.63
> fatigue           1.04        1.08           0.84
> gas_dyn           0.91        0.95           0.75
> induct            3.18        2.57           1.73
> linpk             0.34        0.30           0.21
> mdbx              1.08        1.01           0.59
> nf                0.39        0.41           0.28
> protein           1.55        1.29           0.97
> rnflow            1.76        1.73           1.26
> test_fpu          1.38        1.40           1.05
> tfft              0.31        0.28           0.19
>
> mean              3.11        2.73           2.02
>
> Executable size (bytes)
>
> Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
>                 gcc 4.5.4   dragonegg/optzns    dragonegg
>
> ac              26344        30896           26704
> aermod        1145924      1043816         1052056
> air             57404        57700           53532
> capacita        40864        41008           37064
> channel         22448        22664           22664
> doduc          127340       124108          120124
> fatigue         61152        65352           65664
> gas_dyn        647864        58768 !!!       59024
> induct         162360       180440          175312
> linpk           18112        18848           18864
> mdbx            53464        57652           49516
> nf              22560        23784           24080
> protein         74320        74440           74816
> rnflow          66040        71488           71648
> test_fpu        52624        58224           58320
> tfft            18416        18456           18600
>
> The compile times with optzns are 26% slower than stock dragonegg
> but 12% faster than stock gcc 4.5.4. The most interesting executable
> size difference is gas_dyn which fastest with optzns but 11x larger
> in size with stock gcc 4.5.4 compared to either stock dragonegg or
> dragonegg with optzns. This is likely much improved in gcc 4.6 with
> the new -fwhole-file default.
>
> On Thu, Jun 09, 2011 at 09:51:51AM +0200, Duncan Sands wrote:
>> Hi Jack, thanks for these numbers.  Can you also please measure compile times?
>> I'm thinking of enabling gcc optimizations by default, but I don't want to
>> increase compile times, which means choosing a value for the
>> -fplugin-arg-dragonegg-llvm-ir-optimize option that is low enough to get good
>> compile times, yet high enough to get fast code.  It would be great if you could
>> play around with this to find a good choice.
>>
>> Best wishes, Duncan.
>>
>>>     Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for
>>> usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize
>>> option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with
>>> the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime
>>> benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly
>>> faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns.
>>>
>>> x86_64 darwin
>>>
>>> A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
>>> B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
>>> C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
>>>
>>>
>>> Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
>>>                 gcc 4.5.4   dragonegg/optzns    dragonegg
>>>
>>> ac               9.58          9.13            12.30
>>> aermod          20.88         16.10            17.62
>>> air              6.16          6.59             7.70
>>> capacita        35.68         39.94            46.22
>>> channel          2.03          2.04             1.96
>>> doduc           28.28         28.43            30.41
>>> fatigue          8.13          7.19            10.40
>>> gas_dyn         10.10          9.83            11.73
>>> induct          20.17         20.76            48.76
>>> linpk           15.42         15.65            15.69
>>> mdbx            11.42         11.73            12.07
>>> nf              27.99         28.60            29.39
>>> protein         38.36         39.08            39.98
>>> rnflow          27.28         28.19            31.90
>>> test_fpu        11.43         11.17            11.50
>>> tfft             1.91          1.95             2.16
>>>
>>> Mean            12.72         12.62            14.71
>>>
>>> Once vector_select() is implemented we can retest without -fno-tree-vectorize.
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev