[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status

Thu Jun 9 07:20:12 PDT 2011

On Thu, Jun 09, 2011 at 03:44:40PM +0200, Duncan Sands wrote:
> Hi Jack, thanks for doing this.
>
>>      Below are the tabulated compile times and executable sizes.
>>
>> A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
>> B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
>> C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
>
> These numbers really surprised me: the GCC code generators must be really slow
> if the entire set of LLVM IR and codegen optimizations takes less time to run
> than GCC codegen (since with -fplugin-arg-dragonegg-enable-gcc-optzns the only
> part of GCC being disabled is codegen, i.e. RTL).  I was assuming that I would
> need to reduce the LLVM optimization level to get decent speed.  Are you sure
> that you built GCC with checking disabled (or --enable-checking=release)?

I built gcc-4.5.4 from svn with --enable-check=yes. I'll rebuild gcc-4.5.4 with
--enable-checking=release and repeat the benchmarks.

> Can you please also redo this (along with execution times), adding the option
> -fplugin-arg-dragonegg-llvm-ir-optimize=2.  I expect that to always result in
> a decent compile time win for dragonegg wrt stock gcc-4.5.  If it doesn't have
> a significant impact on execution speed, then I'd be tempted to use the formula
>   LLVM optimization level = (1 + GCC optimization level) / 2
> as the default, i.e. GCC -O3 -> LLVM -O2, GCC -O2 -> LLVM -O1, GCC -O1 -> LLVM
> -O1, GCC -O0 -> LLVM -O0, GCC -O5 -> LLVM -O3.

I'll try this after I repeat the initial benchmarks with --enable-checking=release.
        Jack
>
> Best wishes, Duncan.
>
>>
>> Compile time (seconds)
>>
>> Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
>>                 gcc 4.5.4   dragonegg/optzns    dragonegg
>>
>> ac                0.61        1.65           0.32
>> aermod           31.24       25.83          21.02
>> air               1.74        1.49           0.81
>> capacita          0.83        0.80           0.44
>> channel           0.34        0.33           0.25
>> doduc             3.09        2.63           1.63
>> fatigue           1.04        1.08           0.84
>> gas_dyn           0.91        0.95           0.75
>> induct            3.18        2.57           1.73
>> linpk             0.34        0.30           0.21
>> mdbx              1.08        1.01           0.59
>> nf                0.39        0.41           0.28
>> protein           1.55        1.29           0.97
>> rnflow            1.76        1.73           1.26
>> test_fpu          1.38        1.40           1.05
>> tfft              0.31        0.28           0.19
>>
>> mean              3.11        2.73           2.02
>>
>> Executable size (bytes)
>>
>> Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
>>                 gcc 4.5.4   dragonegg/optzns    dragonegg
>>
>> ac              26344        30896           26704
>> aermod        1145924      1043816         1052056
>> air             57404        57700           53532
>> capacita        40864        41008           37064
>> channel         22448        22664           22664
>> doduc          127340       124108          120124
>> fatigue         61152        65352           65664
>> gas_dyn        647864        58768 !!!       59024
>> induct         162360       180440          175312
>> linpk           18112        18848           18864
>> mdbx            53464        57652           49516
>> nf              22560        23784           24080
>> protein         74320        74440           74816
>> rnflow          66040        71488           71648
>> test_fpu        52624        58224           58320
>> tfft            18416        18456           18600
>>
>> The compile times with optzns are 26% slower than stock dragonegg
>> but 12% faster than stock gcc 4.5.4. The most interesting executable
>> size difference is gas_dyn which fastest with optzns but 11x larger
>> in size with stock gcc 4.5.4 compared to either stock dragonegg or
>> dragonegg with optzns. This is likely much improved in gcc 4.6 with
>> the new -fwhole-file default.
>>
>> On Thu, Jun 09, 2011 at 09:51:51AM +0200, Duncan Sands wrote:
>>> Hi Jack, thanks for these numbers.  Can you also please measure compile times?
>>> I'm thinking of enabling gcc optimizations by default, but I don't want to
>>> increase compile times, which means choosing a value for the
>>> -fplugin-arg-dragonegg-llvm-ir-optimize option that is low enough to get good
>>> compile times, yet high enough to get fast code.  It would be great if you could
>>> play around with this to find a good choice.
>>>
>>> Best wishes, Duncan.
>>>
>>>>     Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for
>>>> usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize
>>>> option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with
>>>> the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime
>>>> benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly
>>>> faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns.
>>>>
>>>> x86_64 darwin
>>>>
>>>> A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
>>>> B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
>>>> C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
>>>>
>>>>
>>>> Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
>>>>                 gcc 4.5.4   dragonegg/optzns    dragonegg
>>>>
>>>> ac               9.58          9.13            12.30
>>>> aermod          20.88         16.10            17.62
>>>> air              6.16          6.59             7.70
>>>> capacita        35.68         39.94            46.22
>>>> channel          2.03          2.04             1.96
>>>> doduc           28.28         28.43            30.41
>>>> fatigue          8.13          7.19            10.40
>>>> gas_dyn         10.10          9.83            11.73
>>>> induct          20.17         20.76            48.76
>>>> linpk           15.42         15.65            15.69
>>>> mdbx            11.42         11.73            12.07
>>>> nf              27.99         28.60            29.39
>>>> protein         38.36         39.08            39.98
>>>> rnflow          27.28         28.19            31.90
>>>> test_fpu        11.43         11.17            11.50
>>>> tfft             1.91          1.95             2.16
>>>>
>>>> Mean            12.72         12.62            14.71
>>>>
>>>> Once vector_select() is implemented we can retest without -fno-tree-vectorize.
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev