[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Duncan Sands
baldrick at free.fr
Thu Jun 9 06:44:40 PDT 2011
Hi Jack, thanks for doing this.
> Below are the tabulated compile times and executable sizes.
>
> A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
> B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
> C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
These numbers really surprised me: the GCC code generators must be really slow
if the entire set of LLVM IR and codegen optimizations takes less time to run
than GCC codegen (since with -fplugin-arg-dragonegg-enable-gcc-optzns the only
part of GCC being disabled is codegen, i.e. RTL). I was assuming that I would
need to reduce the LLVM optimization level to get decent speed. Are you sure
that you built GCC with checking disabled (or --enable-checking=release)?
Can you please also redo this (along with execution times), adding the option
-fplugin-arg-dragonegg-llvm-ir-optimize=2. I expect that to always result in
a decent compile time win for dragonegg wrt stock gcc-4.5. If it doesn't have
a significant impact on execution speed, then I'd be tempted to use the formula
LLVM optimization level = (1 + GCC optimization level) / 2
as the default, i.e. GCC -O3 -> LLVM -O2, GCC -O2 -> LLVM -O1, GCC -O1 -> LLVM
-O1, GCC -O0 -> LLVM -O0, GCC -O5 -> LLVM -O3.
Best wishes, Duncan.
>
> Compile time (seconds)
>
> Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/
> gcc 4.5.4 dragonegg/optzns dragonegg
>
> ac 0.61 1.65 0.32
> aermod 31.24 25.83 21.02
> air 1.74 1.49 0.81
> capacita 0.83 0.80 0.44
> channel 0.34 0.33 0.25
> doduc 3.09 2.63 1.63
> fatigue 1.04 1.08 0.84
> gas_dyn 0.91 0.95 0.75
> induct 3.18 2.57 1.73
> linpk 0.34 0.30 0.21
> mdbx 1.08 1.01 0.59
> nf 0.39 0.41 0.28
> protein 1.55 1.29 0.97
> rnflow 1.76 1.73 1.26
> test_fpu 1.38 1.40 1.05
> tfft 0.31 0.28 0.19
>
> mean 3.11 2.73 2.02
>
> Executable size (bytes)
>
> Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/
> gcc 4.5.4 dragonegg/optzns dragonegg
>
> ac 26344 30896 26704
> aermod 1145924 1043816 1052056
> air 57404 57700 53532
> capacita 40864 41008 37064
> channel 22448 22664 22664
> doduc 127340 124108 120124
> fatigue 61152 65352 65664
> gas_dyn 647864 58768 !!! 59024
> induct 162360 180440 175312
> linpk 18112 18848 18864
> mdbx 53464 57652 49516
> nf 22560 23784 24080
> protein 74320 74440 74816
> rnflow 66040 71488 71648
> test_fpu 52624 58224 58320
> tfft 18416 18456 18600
>
> The compile times with optzns are 26% slower than stock dragonegg
> but 12% faster than stock gcc 4.5.4. The most interesting executable
> size difference is gas_dyn which fastest with optzns but 11x larger
> in size with stock gcc 4.5.4 compared to either stock dragonegg or
> dragonegg with optzns. This is likely much improved in gcc 4.6 with
> the new -fwhole-file default.
>
> On Thu, Jun 09, 2011 at 09:51:51AM +0200, Duncan Sands wrote:
>> Hi Jack, thanks for these numbers. Can you also please measure compile times?
>> I'm thinking of enabling gcc optimizations by default, but I don't want to
>> increase compile times, which means choosing a value for the
>> -fplugin-arg-dragonegg-llvm-ir-optimize option that is low enough to get good
>> compile times, yet high enough to get fast code. It would be great if you could
>> play around with this to find a good choice.
>>
>> Best wishes, Duncan.
>>
>>> Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for
>>> usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize
>>> option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with
>>> the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime
>>> benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly
>>> faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns.
>>>
>>> x86_64 darwin
>>>
>>> A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
>>> B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
>>> C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
>>>
>>>
>>> Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/
>>> gcc 4.5.4 dragonegg/optzns dragonegg
>>>
>>> ac 9.58 9.13 12.30
>>> aermod 20.88 16.10 17.62
>>> air 6.16 6.59 7.70
>>> capacita 35.68 39.94 46.22
>>> channel 2.03 2.04 1.96
>>> doduc 28.28 28.43 30.41
>>> fatigue 8.13 7.19 10.40
>>> gas_dyn 10.10 9.83 11.73
>>> induct 20.17 20.76 48.76
>>> linpk 15.42 15.65 15.69
>>> mdbx 11.42 11.73 12.07
>>> nf 27.99 28.60 29.39
>>> protein 38.36 39.08 39.98
>>> rnflow 27.28 28.19 31.90
>>> test_fpu 11.43 11.17 11.50
>>> tfft 1.91 1.95 2.16
>>>
>>> Mean 12.72 12.62 14.71
>>>
>>> Once vector_select() is implemented we can retest without -fno-tree-vectorize.
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list