[PATCH] D56772: [MIR] Add simple PRE pass to MachineCSE

Tue Jan 29 14:54:44 PST 2019

anton-afanasyev marked an inline comment as done.
anton-afanasyev added a comment.

In D56772#1369571 <https://reviews.llvm.org/D56772#1369571>, @hfinkel wrote:

> Have you measured the compile-time impact?

What do you mean by compile-time impact? I've run test suite of course, compared with baseline.
The results for **`exec_time`** and **`size..text`** metrics are like here:

  > ~/llvm/test-suite/utils/compare.py  --filter-short -m exec_time ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json 
  ...
  Program                                         results_rel_base  results_rel_exp3 diff  

  MicroBench...RayFDRMultiThreaded/threads:32   1055.94           931.21            -11.8%
  Bitcode/Be...rid/halide_bilateral_grid.test    30.26             26.70            -11.8%
  MultiSourc...mbolics-flt/Symbolics-flt.test     1.10              0.98            -10.9%
  MicroBench...XRayFDRMultiThreaded/threads:2   497.85            547.92            10.1% 
  MicroBench...lcalsCRaw.test:BM_EOS_RAW/5001    10.33              9.49            -8.2% 
  MicroBench...bda.test:BM_PIC_1D_LAMBDA/5001    75.75             81.43             7.5% 
  MicroBench...XRayFDRMultiThreaded/threads:8   1029.95           1100.16            6.8% 
  MicroBench...XRayFDRMultiThreaded/threads:4   626.46            667.17             6.5% 
  MicroBench...ambda.test:BM_EOS_LAMBDA/44217    87.21             92.66             6.2% 
  MultiSourc...flt/LoopRestructuring-flt.test     3.76              3.53            -6.2% 
  MicroBench...calsCRaw.test:BM_EOS_RAW/44217    85.94             90.96             5.8% 
  MicroBench...Lambda.test:BM_ADI_LAMBDA/5001   128.27            135.35             5.5% 
  MicroBench...est:BM_ReturnNeverInstrumented     1.60              1.68             5.3% 
  MicroBench....test:BM_MULADDSUB_LAMBDA/5001    14.14             14.84             4.9% 
  SingleSour...ncils/fdtd-apml/fdtd-apml.test     0.63              0.66             4.4%
  ...
  > ~/llvm/test-suite/utils/compare.py  --filter-short -m size..text ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json 
  ...
  Program                                         results_rel_base  results_rel_exp3 diff 

  SingleSour...marks/Shootout/Shootout-random   658.00            642.00            -2.4%
  SingleSour...ce/UnitTests/SignlessTypes/rem   4834.00           4930.00            2.0%
  SingleSour...ootout-C++/Shootout-C++-random   818.00            802.00            -2.0%
  MultiSource/Benchmarks/Olden/mst/mst          2658.00           2706.00            1.8%
  SingleSource/Benchmarks/Misc/ffbench          2322.00           2290.00           -1.4%
  MultiSource/Benchmarks/NPB-serial/is/is       4946.00           4898.00           -1.0%
  MultiSource/Benchmarks/McCat/18-imp/imp       13266.00          13138.00          -1.0%
  MultiSourc...rks/McCat/03-testtrie/testtrie   1810.00           1794.00           -0.9%
  MultiSourc...chmarks/McCat/01-qbsort/qbsort   2066.00           2050.00           -0.8%
  MultiSourc...iabench/g721/g721encode/encode   7394.00           7346.00           -0.6%
  SingleSource/Benchmarks/Misc/fbench           2546.00           2562.00            0.6%
  SingleSour...ce/Benchmarks/Misc/ReedSolomon   11170.00          11234.00           0.6%
  SingleSource/Benchmarks/Stanford/Towers       2802.00           2786.00           -0.6%
  MultiSourc...e/Applications/SIBsim4/SIBsim4   47666.00          47938.00           0.6%
  MultiSourc...OE-ProxyApps-C/miniGMG/miniGMG   76258.00          76674.00           0.5%

Although I'm not sure `exec_time` is good metrics. I've also tried **`compile_time`**, never used it before:

  ~/llvm/test-suite/utils/compare.py  --filter-short -m compile_time ~/llvm/test-suite-results/results_rel_base.json ~/llvm/test-suite-results/results_rel_exp3.json 
  ...
  Program                                         results_rel_base  results_rel_exp3 diff 

  SingleSour.../Benchmarks/Misc-C++/Large/ray     0.68              0.74             7.6%
  MultiSourc...Prolangs-C/simulator/simulator     1.56              1.64             5.7%
  MultiSource/Benchmarks/McCat/18-imp/imp         0.82              0.78            -5.3%
  MultiSourc...OE-ProxyApps-C/RSBench/rsbench     0.88              0.84            -4.6%
  MultiSource/Benchmarks/Ptrdist/bc/bc            2.42              2.52             4.1%
  SingleSour...enchmarks/CoyoteBench/fftbench     1.08              1.03            -4.1%
  SingleSour...UnitTests/Vectorizer/gcc-loops     1.42              1.48             3.9%
  MultiSourc...chmarks/Prolangs-C++/city/city     3.20              3.08            -3.9%
  MultiSource/Benchmarks/sim/sim                  0.72              0.75             3.9%
  MultiSourc...nchmarks/FreeBench/pifft/pifft     2.56              2.65             3.6%
  MultiSource/Applications/spiff/spiff            1.58              1.64             3.5%
  MultiSourc...lds-flt/CrossingThresholds-flt     3.80              3.94             3.5%
  MultiSourc...s/Prolangs-C/compiler/compiler     0.84              0.81            -3.3%
  MultiSourc...rks/Prolangs-C++/employ/employ     0.74              0.71            -3.3%
  SingleSour...ks/Misc-C++/stepanov_container     3.01              3.11             3.2%
  ...

Also one can see many tests increased their **`machine-cse.NumCSEs`** statistics (meaning that PRE step created common subexpressions eliminated by CSE):

  > ~/llvm/test-suite/utils/compare.py  --filter-short -m machine-cse.NumCSEs ~/llvm/test-suite-results_deb_base.json ~/llvm/test-suite-results/results_deb_exp.json
  ...
  Program                                         results_deb_base  results_deb_exp diff   

  SingleSour...ce/UnitTests/SignlessTypes/rem     1.00             19.00           1800.0%
  MultiSource/Benchmarks/llubenchmark/llu         1.00             10.00           900.0% 
  MultiSource/Benchmarks/NPB-serial/is/is         1.00              9.00           800.0% 
  SingleSource/Benchmarks/Misc/fbench             1.00              7.00           600.0% 
  MultiSourc...OE-ProxyApps-C/XSBench/XSBench     2.00              8.00           300.0% 
  SingleSource/Benchmarks/Misc/ffbench            2.00              8.00           300.0% 
  SingleSour...chmarks/Misc-C++/stepanov_v1p2     1.00              3.00           200.0% 
  MultiSourc...arks/FreeBench/distray/distray     7.00             20.00           185.7% 
  MicroBench.../ImageProcessing/Dither/Dither     3.00              8.00           166.7% 
  MultiSourc...ench/telecomm-FFT/telecomm-fft     5.00             13.00           160.0% 
  MultiSourc...enchmarks/mafft/pairlocalalign   482.00            1214.00          151.9% 
  MultiSourc...aran/netbench-url/netbench-url     3.00              7.00           133.3% 
  MultiSourc...Benchmarks/SciMark2-C/scimark2     6.00             13.00           116.7% 
  MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode    57.00            115.00           101.8% 
  MultiSourc...chmarks/McCat/01-qbsort/qbsort     2.00              4.00           100.0%
  ...

================
Comment at: lib/CodeGen/MachineCSE.cpp:755
+// We use stronger conditions for PREed instrs rather than for CSE ones
+// to decrease number of PREed instrs that won't be CSEed.
+bool MachineCSE::isPRECandidate(MachineInstr *MI) {
----------------
hfinkel wrote:
> I think it would be useful to elaborate on under what conditions we create these PRE instructions that are not later CSEd. Do you know how often that happens (I assume that you can figure this out by looking at the statistics and comparing runs where this happens vs. where it doesn't)?
> 
Thanks for pointing this out, I've cleaned this place. I've removed needless checks like `MI->isReturn()` after comparing statistics (no change).

Mostly, these checks are implicitly done while CSE work (not inside `isCSECandidate()`). I've left aux checks of virtual registers though -- they are also implicitly done.

I've also updated comment.

This looks artificially, but I'm going to refactor this place in the third (and last) step of this work -- merging PRE and CSE, so we need no different checks.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56772/new/

https://reviews.llvm.org/D56772