[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Jack Howarth
howarth at bromo.med.uc.edu
Thu Jun 9 07:20:12 PDT 2011
On Thu, Jun 09, 2011 at 03:44:40PM +0200, Duncan Sands wrote:
> Hi Jack, thanks for doing this.
>
>> Below are the tabulated compile times and executable sizes.
>>
>> A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
>> B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
>> C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
>
> These numbers really surprised me: the GCC code generators must be really slow
> if the entire set of LLVM IR and codegen optimizations takes less time to run
> than GCC codegen (since with -fplugin-arg-dragonegg-enable-gcc-optzns the only
> part of GCC being disabled is codegen, i.e. RTL). I was assuming that I would
> need to reduce the LLVM optimization level to get decent speed. Are you sure
> that you built GCC with checking disabled (or --enable-checking=release)?
I built gcc-4.5.4 from svn with --enable-check=yes. I'll rebuild gcc-4.5.4 with
--enable-checking=release and repeat the benchmarks.
> Can you please also redo this (along with execution times), adding the option
> -fplugin-arg-dragonegg-llvm-ir-optimize=2. I expect that to always result in
> a decent compile time win for dragonegg wrt stock gcc-4.5. If it doesn't have
> a significant impact on execution speed, then I'd be tempted to use the formula
> LLVM optimization level = (1 + GCC optimization level) / 2
> as the default, i.e. GCC -O3 -> LLVM -O2, GCC -O2 -> LLVM -O1, GCC -O1 -> LLVM
> -O1, GCC -O0 -> LLVM -O0, GCC -O5 -> LLVM -O3.
I'll try this after I repeat the initial benchmarks with --enable-checking=release.
Jack
>
> Best wishes, Duncan.
>
>>
>> Compile time (seconds)
>>
>> Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/
>> gcc 4.5.4 dragonegg/optzns dragonegg
>>
>> ac 0.61 1.65 0.32
>> aermod 31.24 25.83 21.02
>> air 1.74 1.49 0.81
>> capacita 0.83 0.80 0.44
>> channel 0.34 0.33 0.25
>> doduc 3.09 2.63 1.63
>> fatigue 1.04 1.08 0.84
>> gas_dyn 0.91 0.95 0.75
>> induct 3.18 2.57 1.73
>> linpk 0.34 0.30 0.21
>> mdbx 1.08 1.01 0.59
>> nf 0.39 0.41 0.28
>> protein 1.55 1.29 0.97
>> rnflow 1.76 1.73 1.26
>> test_fpu 1.38 1.40 1.05
>> tfft 0.31 0.28 0.19
>>
>> mean 3.11 2.73 2.02
>>
>> Executable size (bytes)
>>
>> Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/
>> gcc 4.5.4 dragonegg/optzns dragonegg
>>
>> ac 26344 30896 26704
>> aermod 1145924 1043816 1052056
>> air 57404 57700 53532
>> capacita 40864 41008 37064
>> channel 22448 22664 22664
>> doduc 127340 124108 120124
>> fatigue 61152 65352 65664
>> gas_dyn 647864 58768 !!! 59024
>> induct 162360 180440 175312
>> linpk 18112 18848 18864
>> mdbx 53464 57652 49516
>> nf 22560 23784 24080
>> protein 74320 74440 74816
>> rnflow 66040 71488 71648
>> test_fpu 52624 58224 58320
>> tfft 18416 18456 18600
>>
>> The compile times with optzns are 26% slower than stock dragonegg
>> but 12% faster than stock gcc 4.5.4. The most interesting executable
>> size difference is gas_dyn which fastest with optzns but 11x larger
>> in size with stock gcc 4.5.4 compared to either stock dragonegg or
>> dragonegg with optzns. This is likely much improved in gcc 4.6 with
>> the new -fwhole-file default.
>>
>> On Thu, Jun 09, 2011 at 09:51:51AM +0200, Duncan Sands wrote:
>>> Hi Jack, thanks for these numbers. Can you also please measure compile times?
>>> I'm thinking of enabling gcc optimizations by default, but I don't want to
>>> increase compile times, which means choosing a value for the
>>> -fplugin-arg-dragonegg-llvm-ir-optimize option that is low enough to get good
>>> compile times, yet high enough to get fast code. It would be great if you could
>>> play around with this to find a good choice.
>>>
>>> Best wishes, Duncan.
>>>
>>>> Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for
>>>> usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize
>>>> option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with
>>>> the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime
>>>> benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly
>>>> faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns.
>>>>
>>>> x86_64 darwin
>>>>
>>>> A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
>>>> B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
>>>> C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
>>>>
>>>>
>>>> Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/
>>>> gcc 4.5.4 dragonegg/optzns dragonegg
>>>>
>>>> ac 9.58 9.13 12.30
>>>> aermod 20.88 16.10 17.62
>>>> air 6.16 6.59 7.70
>>>> capacita 35.68 39.94 46.22
>>>> channel 2.03 2.04 1.96
>>>> doduc 28.28 28.43 30.41
>>>> fatigue 8.13 7.19 10.40
>>>> gas_dyn 10.10 9.83 11.73
>>>> induct 20.17 20.76 48.76
>>>> linpk 15.42 15.65 15.69
>>>> mdbx 11.42 11.73 12.07
>>>> nf 27.99 28.60 29.39
>>>> protein 38.36 39.08 39.98
>>>> rnflow 27.28 28.19 31.90
>>>> test_fpu 11.43 11.17 11.50
>>>> tfft 1.91 1.95 2.16
>>>>
>>>> Mean 12.72 12.62 14.71
>>>>
>>>> Once vector_select() is implemented we can retest without -fno-tree-vectorize.
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list