[llvm] [RISCV] Support postRA vsetvl insertion pass (PR #70549)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Wed May 15 03:39:18 PDT 2024
lukel97 wrote:
I ran this on SPEC CPU 2017 (-O3 -march=rv64gv) and collected the dynamic instruction count (total insns), dynamic vsetvl instruction count (vset), dynamic non vsetvl x0,x0 instruction count (vl set) and number of spills inserted.
The changes are all less than < +-1%, with the exception of 619.lbm_s/519.lbm_r where there's 40% more vsetvls executed but a 23% improvement in dynamic instruction count. The improvement seems to be due to us spilling less vectors in a hotblock.
On leela, there's 0.9% more vsetvls executed but 0.9% more non-x0,x0s. So it's likely that the new vsetvls aren't VL preserving.
On x264, there's more non-vsetvl x0,x0s executed, but less vsetvls executed overall.
So overall I think this looks good.
| ('Program', '') | ('vset', 'prera') | ('vset', 'postra') | ('vset', 'diff') | ('total insns', 'prera') | ('total insns', 'postra') | ('total insns', 'diff') | ('regalloc.NumSpills', 'prera') | ('regalloc.NumSpills', 'postra') | ('regalloc.NumSpills', 'diff') | ('vl set', 'prera') | ('vl set', 'postra') | ('vl set', 'diff') |
|:-------------------------------------------------------------------------------|--------------------:|---------------------:|-------------------:|---------------------------:|----------------------------:|--------------------------:|----------------------------------:|-----------------------------------:|---------------------------------:|----------------------:|-----------------------:|---------------------:|
| test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test | 3.93984e+09 | 5.49984e+09 | 0.395955 | 9.34441e+11 | 7.23061e+11 | -0.22621 | 208 | 179 | -0.139423 | 1662 | 1666 | 0.00240674 |
| test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test | 4.95512e+08 | 6.90512e+08 | 0.393532 | 1.17465e+11 | 9.08479e+10 | -0.226599 | 208 | 180 | -0.134615 | 1142 | 1146 | 0.00350263 |
| test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test | 4.22353e+08 | 4.26072e+08 | 0.00880622 | 4.674e+11 | 4.67404e+11 | 7.95723e-06 | 358 | 358 | 0 | 3.91329e+08 | 3.94923e+08 | 0.00918232 |
| test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test | 4.22353e+08 | 4.26072e+08 | 0.00880622 | 4.674e+11 | 4.67404e+11 | 7.95723e-06 | 358 | 358 | 0 | 3.91329e+08 | 3.94923e+08 | 0.00918232 |
| test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test | 1.88329e+09 | 1.88349e+09 | 0.000103542 | 7.20803e+11 | 7.20803e+11 | 3.96615e-07 | 13887 | 13888 | 7.20098e-05 | 1.4395e+09 | 1.43956e+09 | 4.23063e-05 |
| test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test | 1.73542e+07 | 1.73559e+07 | 9.89384e-05 | 1.08127e+10 | 1.08127e+10 | 1.69088e-06 | 13720 | 13728 | 0.00058309 | 1.69636e+07 | 1.69654e+07 | 0.000108586 |
| test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test | 1.73542e+07 | 1.73559e+07 | 9.89384e-05 | 1.08124e+10 | 1.08124e+10 | 1.69092e-06 | 13720 | 13728 | 0.00058309 | 1.69636e+07 | 1.69654e+07 | 0.000108586 |
| test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test | 6.54343e+07 | 6.54382e+07 | 5.89141e-05 | 3.15461e+10 | 3.15416e+10 | -0.000142572 | 1916 | 1902 | -0.00730689 | 5.5646e+07 | 5.56494e+07 | 6.2089e-05 |
| test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test | 1.7041e+09 | 1.70412e+09 | 1.18502e-05 | 1.50246e+11 | 1.50246e+11 | -4.98108e-07 | 44229 | 44232 | 6.78288e-05 | 1.69819e+09 | 1.69819e+09 | 6.17128e-07 |
| test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test | 2.26807e+08 | 2.26807e+08 | 7.93626e-07 | 1.41909e+11 | 1.4195e+11 | 0.000287282 | 694 | 698 | 0.00576369 | 2.26807e+08 | 2.26807e+08 | 7.93626e-07 |
| test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test | 2.26807e+08 | 2.26807e+08 | 7.93626e-07 | 1.41903e+11 | 1.41943e+11 | 0.000287296 | 694 | 698 | 0.00576369 | 2.26807e+08 | 2.26807e+08 | 7.93626e-07 |
| test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test | 1.86518e+10 | 1.86518e+10 | 1.0814e-07 | 2.47888e+11 | 2.47888e+11 | 1.17916e-08 | 4634 | 4670 | 0.00776867 | 2.62581e+07 | 2.62609e+07 | 0.000105491 |
| test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test | 1.86518e+10 | 1.86518e+10 | 1.0814e-07 | 2.47888e+11 | 2.47888e+11 | 1.17916e-08 | 4634 | 4670 | 0.00776867 | 2.62581e+07 | 2.62609e+07 | 0.000105491 |
| test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test | 218721 | 218721 | 0 | 1.79594e+09 | 1.79594e+09 | 0 | 4291 | 4291 | 0 | 218718 | 218718 | 0 |
| test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test | 6.19906e+07 | 6.19906e+07 | 0 | 4.72968e+10 | 4.72968e+10 | 0 | 301 | 301 | 0 | 5.57433e+07 | 5.57433e+07 | 0 |
| test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test | 2.36169e+07 | 2.36169e+07 | 0 | 1.53744e+11 | 1.53744e+11 | 0 | 123 | 123 | 0 | 2.36169e+07 | 2.36169e+07 | 0 |
| test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test | 6.19906e+07 | 6.19906e+07 | 0 | 4.72968e+10 | 4.72968e+10 | 0 | 301 | 301 | 0 | 5.57433e+07 | 5.57433e+07 | 0 |
| test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test | 2.23429e+09 | 2.23429e+09 | 0 | 4.08002e+11 | 4.08002e+11 | 0 | 749 | 749 | 0 | 2.23429e+09 | 2.23429e+09 | 0 |
| test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test | 218721 | 218721 | 0 | 1.79594e+09 | 1.79594e+09 | 0 | 4291 | 4291 | 0 | 218718 | 218718 | 0 |
| test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test | 2.23429e+09 | 2.23429e+09 | 0 | 4.08002e+11 | 4.08002e+11 | 0 | 749 | 749 | 0 | 2.23429e+09 | 2.23429e+09 | 0 |
| test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test | 9.48185e+07 | 9.48185e+07 | 0 | 2.1198e+11 | 2.11983e+11 | 1.13134e-05 | 6743 | 6744 | 0.000148302 | 9.46089e+07 | 9.46089e+07 | 0 |
| test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test | 2.36169e+07 | 2.36169e+07 | 0 | 1.53744e+11 | 1.53744e+11 | 0 | 123 | 123 | 0 | 2.36169e+07 | 2.36169e+07 | 0 |
| test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test | 4.64859e+08 | 4.64859e+08 | -4.30238e-09 | 2.89851e+11 | 2.89851e+11 | 2.99464e-09 | 1604 | 1604 | 0 | 4.51635e+08 | 4.51635e+08 | -4.42836e-09 |
| test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test | 4.64859e+08 | 4.64859e+08 | -4.30238e-09 | 2.89851e+11 | 2.89851e+11 | 3.40174e-09 | 1604 | 1604 | 0 | 4.51635e+08 | 4.51635e+08 | -4.42836e-09 |
| test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test | 6.96396e+08 | 6.96396e+08 | -1.37853e-07 | 4.42779e+11 | 4.42818e+11 | 8.90763e-05 | 359 | 358 | -0.00278552 | 6.37384e+08 | 6.37384e+08 | 0 |
| test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test | 6.60366e+08 | 6.60366e+08 | -1.45374e-07 | 4.10743e+11 | 4.10779e+11 | 8.81044e-05 | 359 | 358 | -0.00278552 | 6.04725e+08 | 6.04725e+08 | 0 |
| test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test | 5.04246e+09 | 5.02065e+09 | -0.00432564 | 2.19801e+11 | 2.19701e+11 | -0.000456991 | 1894 | 1894 | 0 | 1.16667e+09 | 1.1786e+09 | 0.0102294 |
| test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test | 5.04246e+09 | 5.02065e+09 | -0.00432564 | 2.19801e+11 | 2.19701e+11 | -0.000456991 | 1894 | 1894 | 0 | 1.16667e+09 | 1.1786e+09 | 0.0102294 |
| Geomean difference | | | 0.0243876 | | | -0.0181787 | | | -0.0099226 | | | 0.00161097 |
https://github.com/llvm/llvm-project/pull/70549
More information about the llvm-commits
mailing list