[PATCH] D56772: [MIR] Add simple PRE pass to MachineCSE
Anton Afanasyev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 29 14:54:44 PST 2019
anton-afanasyev marked an inline comment as done.
anton-afanasyev added a comment.
In D56772#1369571 <https://reviews.llvm.org/D56772#1369571>, @hfinkel wrote:
> Have you measured the compile-time impact?
What do you mean by compile-time impact? I've run test suite of course, compared with baseline.
The results for **`exec_time`** and **`size..text`** metrics are like here:
> ~/llvm/test-suite/utils/compare.py --filter-short -m exec_time ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json
...
Program results_rel_base results_rel_exp3 diff
MicroBench...RayFDRMultiThreaded/threads:32 1055.94 931.21 -11.8%
Bitcode/Be...rid/halide_bilateral_grid.test 30.26 26.70 -11.8%
MultiSourc...mbolics-flt/Symbolics-flt.test 1.10 0.98 -10.9%
MicroBench...XRayFDRMultiThreaded/threads:2 497.85 547.92 10.1%
MicroBench...lcalsCRaw.test:BM_EOS_RAW/5001 10.33 9.49 -8.2%
MicroBench...bda.test:BM_PIC_1D_LAMBDA/5001 75.75 81.43 7.5%
MicroBench...XRayFDRMultiThreaded/threads:8 1029.95 1100.16 6.8%
MicroBench...XRayFDRMultiThreaded/threads:4 626.46 667.17 6.5%
MicroBench...ambda.test:BM_EOS_LAMBDA/44217 87.21 92.66 6.2%
MultiSourc...flt/LoopRestructuring-flt.test 3.76 3.53 -6.2%
MicroBench...calsCRaw.test:BM_EOS_RAW/44217 85.94 90.96 5.8%
MicroBench...Lambda.test:BM_ADI_LAMBDA/5001 128.27 135.35 5.5%
MicroBench...est:BM_ReturnNeverInstrumented 1.60 1.68 5.3%
MicroBench....test:BM_MULADDSUB_LAMBDA/5001 14.14 14.84 4.9%
SingleSour...ncils/fdtd-apml/fdtd-apml.test 0.63 0.66 4.4%
...
> ~/llvm/test-suite/utils/compare.py --filter-short -m size..text ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json
...
Program results_rel_base results_rel_exp3 diff
SingleSour...marks/Shootout/Shootout-random 658.00 642.00 -2.4%
SingleSour...ce/UnitTests/SignlessTypes/rem 4834.00 4930.00 2.0%
SingleSour...ootout-C++/Shootout-C++-random 818.00 802.00 -2.0%
MultiSource/Benchmarks/Olden/mst/mst 2658.00 2706.00 1.8%
SingleSource/Benchmarks/Misc/ffbench 2322.00 2290.00 -1.4%
MultiSource/Benchmarks/NPB-serial/is/is 4946.00 4898.00 -1.0%
MultiSource/Benchmarks/McCat/18-imp/imp 13266.00 13138.00 -1.0%
MultiSourc...rks/McCat/03-testtrie/testtrie 1810.00 1794.00 -0.9%
MultiSourc...chmarks/McCat/01-qbsort/qbsort 2066.00 2050.00 -0.8%
MultiSourc...iabench/g721/g721encode/encode 7394.00 7346.00 -0.6%
SingleSource/Benchmarks/Misc/fbench 2546.00 2562.00 0.6%
SingleSour...ce/Benchmarks/Misc/ReedSolomon 11170.00 11234.00 0.6%
SingleSource/Benchmarks/Stanford/Towers 2802.00 2786.00 -0.6%
MultiSourc...e/Applications/SIBsim4/SIBsim4 47666.00 47938.00 0.6%
MultiSourc...OE-ProxyApps-C/miniGMG/miniGMG 76258.00 76674.00 0.5%
Although I'm not sure `exec_time` is good metrics. I've also tried **`compile_time`**, never used it before:
~/llvm/test-suite/utils/compare.py --filter-short -m compile_time ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json
...
Program results_rel_base results_rel_exp3 diff
SingleSour.../Benchmarks/Misc-C++/Large/ray 0.68 0.74 7.6%
MultiSourc...Prolangs-C/simulator/simulator 1.56 1.64 5.7%
MultiSource/Benchmarks/McCat/18-imp/imp 0.82 0.78 -5.3%
MultiSourc...OE-ProxyApps-C/RSBench/rsbench 0.88 0.84 -4.6%
MultiSource/Benchmarks/Ptrdist/bc/bc 2.42 2.52 4.1%
SingleSour...enchmarks/CoyoteBench/fftbench 1.08 1.03 -4.1%
SingleSour...UnitTests/Vectorizer/gcc-loops 1.42 1.48 3.9%
MultiSourc...chmarks/Prolangs-C++/city/city 3.20 3.08 -3.9%
MultiSource/Benchmarks/sim/sim 0.72 0.75 3.9%
MultiSourc...nchmarks/FreeBench/pifft/pifft 2.56 2.65 3.6%
MultiSource/Applications/spiff/spiff 1.58 1.64 3.5%
MultiSourc...lds-flt/CrossingThresholds-flt 3.80 3.94 3.5%
MultiSourc...s/Prolangs-C/compiler/compiler 0.84 0.81 -3.3%
MultiSourc...rks/Prolangs-C++/employ/employ 0.74 0.71 -3.3%
SingleSour...ks/Misc-C++/stepanov_container 3.01 3.11 3.2%
...
Also one can see many tests increased their **`machine-cse.NumCSEs`** statistics (meaning that PRE step created common subexpressions eliminated by CSE):
> ~/llvm/test-suite/utils/compare.py --filter-short -m machine-cse.NumCSEs ~/llvm/test-suite-results_deb_base.json ~/llvm/test-suite-results/results_deb_exp.json
...
Program results_deb_base results_deb_exp diff
SingleSour...ce/UnitTests/SignlessTypes/rem 1.00 19.00 1800.0%
MultiSource/Benchmarks/llubenchmark/llu 1.00 10.00 900.0%
MultiSource/Benchmarks/NPB-serial/is/is 1.00 9.00 800.0%
SingleSource/Benchmarks/Misc/fbench 1.00 7.00 600.0%
MultiSourc...OE-ProxyApps-C/XSBench/XSBench 2.00 8.00 300.0%
SingleSource/Benchmarks/Misc/ffbench 2.00 8.00 300.0%
SingleSour...chmarks/Misc-C++/stepanov_v1p2 1.00 3.00 200.0%
MultiSourc...arks/FreeBench/distray/distray 7.00 20.00 185.7%
MicroBench.../ImageProcessing/Dither/Dither 3.00 8.00 166.7%
MultiSourc...ench/telecomm-FFT/telecomm-fft 5.00 13.00 160.0%
MultiSourc...enchmarks/mafft/pairlocalalign 482.00 1214.00 151.9%
MultiSourc...aran/netbench-url/netbench-url 3.00 7.00 133.3%
MultiSourc...Benchmarks/SciMark2-C/scimark2 6.00 13.00 116.7%
MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode 57.00 115.00 101.8%
MultiSourc...chmarks/McCat/01-qbsort/qbsort 2.00 4.00 100.0%
...
================
Comment at: lib/CodeGen/MachineCSE.cpp:755
+// We use stronger conditions for PREed instrs rather than for CSE ones
+// to decrease number of PREed instrs that won't be CSEed.
+bool MachineCSE::isPRECandidate(MachineInstr *MI) {
----------------
hfinkel wrote:
> I think it would be useful to elaborate on under what conditions we create these PRE instructions that are not later CSEd. Do you know how often that happens (I assume that you can figure this out by looking at the statistics and comparing runs where this happens vs. where it doesn't)?
>
Thanks for pointing this out, I've cleaned this place. I've removed needless checks like `MI->isReturn()` after comparing statistics (no change).
Mostly, these checks are implicitly done while CSE work (not inside `isCSECandidate()`). I've left aux checks of virtual registers though -- they are also implicitly done.
I've also updated comment.
This looks artificially, but I'm going to refactor this place in the third (and last) step of this work -- merging PRE and CSE, so we need no different checks.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D56772/new/
https://reviews.llvm.org/D56772
More information about the llvm-commits
mailing list