[llvm] [MCP] Move dependencies if they block copy propagation (PR #105562)

Mon Aug 26 02:03:04 PDT 2024

qcolombet wrote:

> I have done some benchmarking.
> 
> I used this cmake command for the benchmarks:
> 
> ```
> cmake -DCMAKE_C_COMPILER=llvm-project/build/bin/clang  -DCMAKE_CXX_COMPILER=llvm-project/build/bin/clang++ -DTEST_SUITE_BENCHMARKING_ONLY=1 .. -GNinja
> ```
> 
> I used this command to compile:
> 
> ```
> ninja -j1
> ```
> 
> This is the llvm-lit invcoaction:
> 
> ```
> llvm-lit -v -j 1 -o resnew21.json . && /home/spaits/repo/llvm-project/build/bin/llvm-lit -v -j 1 -o resnew22.json .
> ```
> 
> I ran compare with this command:
> 
> ```
> python3 ../utils/compare.py resnew21.json resnew22.json vs resold21.json resold22.json
> ```
> 
> And here are the results for compile time:
> 
> ```
> Tests: 1173
> Metric: compile_time
> 
> /home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
>   name0 = names[0]
> Program                                       compile_time             
>                                               lhs          rhs    diff 
> SingleSource/Benchmarks/Misc/pi                 0.14         0.16 13.2%
> SingleSource/Benchmarks/Misc/fp-convert         0.24         0.26  6.1%
> SingleSour...bench/stencils/fdtd-2d/fdtd-2d     1.63         1.73  6.0%
> SingleSour...h/stencils/jacobi-1d/jacobi-1d     0.93         0.98  5.7%
> SingleSour...near-algebra/kernels/bicg/bicg     0.81         0.86  5.2%
> SingleSour...h/linear-algebra/solvers/lu/lu     1.18         1.22  3.4%
> SingleSour...-algebra/solvers/durbin/durbin     0.89         0.92  3.4%
> MultiSourc...Benchmarks/Olden/health/health     1.38         1.43  3.4%
> SingleSource/Benchmarks/Misc/salsa20            0.56         0.58  3.1%
> SingleSour...rks/Polybench/stencils/adi/adi     0.95         0.98  2.8%
> SingleSour...bench/medley/nussinov/nussinov     1.13         1.16  2.6%
> SingleSource/Benchmarks/Misc/mandel-2           0.28         0.29  2.4%
> SingleSour...Benchmarks/Misc/matmul_f64_4x4     0.30         0.31  2.3%
> SingleSource/Benchmarks/SmallPT/smallpt         1.95         2.00  2.3%
> MultiSourc...arks/FreeBench/distray/distray     1.06         1.08  2.2%
>                            Geomean difference                     -1.8%
>       compile_time                         
> l/r            lhs          rhs        diff
> count  1173.000000  1173.000000  284.000000
> mean   5.521519     5.432766    -0.018112  
> std    35.607681    35.240664    0.026822  
> min    0.000000     0.000000    -0.093135  
> 25%    0.000000     0.000000    -0.033920  
> 50%    0.000000     0.000000    -0.016928  
> 75%    0.000000     0.000000    -0.004431  
> max    891.948600   888.081400   0.132171  
> ```
> 
> And for execution time:
> 
> ```
> Metric: exec_time
> 
> /home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
>   name0 = names[0]
> Program                                       exec_time               
>                                               lhs       rhs    diff   
> SingleSour...Benchmarks/Stanford/Oscar.test     0.00      0.00    inf%
> SingleSour...ncils/jacobi-1d/jacobi-1d.test     0.00      0.00    inf%
> MultiSourc.../Prolangs-C/bison/mybison.test     0.00      0.00    inf%
> MultiSourc...adpcm/rawcaudio/rawcaudio.test     0.00      0.00  140.0%
> MultiSourc...abench/jpeg/jpeg-6a/cjpeg.test     0.00      0.00  100.0%
> MultiSourc...ks/Prolangs-C++/city/city.test     0.00      0.00   61.5%
> MultiSourc...cCat/03-testtrie/testtrie.test     0.00      0.00   60.0%
> SingleSour...ootout/Shootout-ackermann.test     0.01      0.01   51.9%
> SingleSour...s/BenchmarkGame/recursive.test     0.42      0.59   40.0%
> SingleSour...tout-C++/Shootout-C++-ary.test     0.01      0.01   35.0%
> SingleSour...++/Shootout-C++-ackermann.test     0.66      0.87   31.0%
> SingleSour...out-C++/Shootout-C++-ary2.test     0.01      0.01   25.8%
> MicroBench...st:BM_DIFF_PREDICT_LAMBDA/5001    23.13     29.09   25.7%
> MultiSourc...telecomm-FFT/telecomm-fft.test     0.01      0.01   24.6%
> MultiSourc...lications/ClamAV/clamscan.test     0.04      0.05   21.8%
>                            Geomean difference                  -100.0%
> /home/spaits/.local/lib/python3.10/site-packages/pandas/core/nanops.py:1016: RuntimeWarning: invalid value encountered in subtract
>   sqr = _ensure_numeric((avg - values) ** 2)
>            exec_time                            
> l/r              lhs            rhs         diff
> count  1173.000000    1173.000000    1149.000000
> mean   2071.330440    2088.736758    inf        
> std    25427.791777   25650.260153  NaN         
> min    0.000000       0.000000      -1.000000   
> 25%    1.204800       1.205000      -0.010028   
> 50%    5.687519       5.608971       0.000000   
> 75%    133.613683     132.687198     0.017685   
> max    643010.637615  646757.697936  inf 
> ```
> 
> text section sizes:
> 
> ```
> Tests: 1173
> Metric: size..text
> 
> /home/spaits/repo/llvm-test-suite/build/../utils/compare.py:206: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
>   name0 = names[0]
> Program                                       size..text                
>                                               lhs        rhs       diff 
> MultiSourc...nchmarks/Olden/treeadd/treeadd      589.00     605.00  2.7%
> SingleSour...chmarks/BenchmarkGame/fannkuch     1619.00    1635.00  1.0%
> MultiSourc...enchmarks/McCat/17-bintr/bintr     1781.00    1797.00  0.9%
> MultiSourc...work-patricia/network-patricia     1999.00    2015.00  0.8%
> MultiSourc...chmarks/McCat/04-bisect/bisect     3261.00    3277.00  0.5%
> SingleSource/Benchmarks/McGill/chomp            6641.00    6673.00  0.5%
> SingleSour.../Benchmarks/Misc-C++/Large/ray     4577.00    4593.00  0.3%
> MultiSource/Benchmarks/Olden/bh/bh             12026.00   12058.00  0.3%
> MultiSourc...e/Applications/SIBsim4/SIBsim4    45430.00   45494.00  0.1%
> Bitcode/Be...hmarks/Halide/blur/halide_blur    34776.00   34824.00  0.1%
> MultiSource/Benchmarks/sim/sim                 17775.00   17791.00  0.1%
> MultiSourc...e/Benchmarks/MallocBench/gs/gs   152609.00  152737.00  0.1%
> Bitcode/Be...ral_grid/halide_bilateral_grid    58680.00   58728.00  0.1%
> MultiSourc...hmarks/MallocBench/cfrac/cfrac    20661.00   20677.00  0.1%
> MultiSourc...e/Applications/minisat/minisat    21741.00   21757.00  0.1%
>                            Geomean difference                       0.0%
>           size..text                           
> l/r              lhs            rhs        diff
> count  1173.000000    1173.000000    318.000000
> mean   17475.696505   17479.068201   0.000287  
> std    77419.292919   77433.314426   0.001806  
> min    0.000000       0.000000       0.000000  
> 25%    0.000000       0.000000       0.000000  
> 50%    0.000000       0.000000       0.000000  
> 75%    786.000000     786.000000     0.000000  
> max    906913.000000  907121.000000  0.027165 
> ```
> 
> I should do some fine tuning:
> 
> * Only enable this optimization from O2 or O3 or Os.
> * If O3 or O3 is enabled, then take instruction latencies into account when doing the instruction moving.
> * If Os (size opt if I am correct) is enabled then do the optimization regardless of latencies.
> 
> What do you think would this PR be fine for O3 or O2 or Os?

Couple of comments:
- If I am not mistaken what you are reporting is not CTMark. Your cmake command should have `-DTEST_SUITE_SUBDIRS=CTMark`
- Usually the baseline is on the lhs, but here it is the opposite. That's fine but it is surprising at first.
- How was your compiler compiled? (Release, release + asserts, ...?)

Could you re-run with CTMark?
The current tests are too small to be relevant (sub 1 second for most of them).

https://github.com/llvm/llvm-project/pull/105562