[llvm] [RISCV] Support postRA vsetvl insertion pass (PR #70549)

Luke Lau via llvm-commits llvm-commits at lists.llvm.org
Wed May 15 03:39:18 PDT 2024


lukel97 wrote:

I ran this on SPEC CPU 2017 (-O3 -march=rv64gv) and collected the dynamic instruction count (total insns), dynamic vsetvl instruction count (vset), dynamic non vsetvl x0,x0 instruction count (vl set) and number of spills inserted.

The changes are all less than < +-1%, with the exception of 619.lbm_s/519.lbm_r where there's 40% more vsetvls executed but a 23% improvement in dynamic instruction count. The improvement seems to be due to us spilling less vectors in a hotblock. 

On leela, there's 0.9% more vsetvls executed but 0.9% more non-x0,x0s. So it's likely that the new vsetvls aren't VL preserving.
On x264, there's more non-vsetvl x0,x0s executed, but less vsetvls executed overall.

So overall I think this looks good.

| ('Program', '')                                                                |   ('vset', 'prera') |   ('vset', 'postra') |   ('vset', 'diff') |   ('total insns', 'prera') |   ('total insns', 'postra') |   ('total insns', 'diff') |   ('regalloc.NumSpills', 'prera') |   ('regalloc.NumSpills', 'postra') |   ('regalloc.NumSpills', 'diff') |   ('vl set', 'prera') |   ('vl set', 'postra') |   ('vl set', 'diff') |
|:-------------------------------------------------------------------------------|--------------------:|---------------------:|-------------------:|---------------------------:|----------------------------:|--------------------------:|----------------------------------:|-----------------------------------:|---------------------------------:|----------------------:|-----------------------:|---------------------:|
| test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test              |         3.93984e+09 |          5.49984e+09 |        0.395955    |                9.34441e+11 |                 7.23061e+11 |              -0.22621     |                               208 |                                179 |                     -0.139423    |        1662           |         1666           |          0.00240674  |
| test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test               |         4.95512e+08 |          6.90512e+08 |        0.393532    |                1.17465e+11 |                 9.08479e+10 |              -0.226599    |                               208 |                                180 |                     -0.134615    |        1142           |         1146           |          0.00350263  |
| test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test          |         4.22353e+08 |          4.26072e+08 |        0.00880622  |                4.674e+11   |                 4.67404e+11 |               7.95723e-06 |                               358 |                                358 |                      0           |           3.91329e+08 |            3.94923e+08 |          0.00918232  |
| test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test         |         4.22353e+08 |          4.26072e+08 |        0.00880622  |                4.674e+11   |                 4.67404e+11 |               7.95723e-06 |                               358 |                                358 |                      0           |           3.91329e+08 |            3.94923e+08 |          0.00918232  |
| test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test       |         1.88329e+09 |          1.88349e+09 |        0.000103542 |                7.20803e+11 |                 7.20803e+11 |               3.96615e-07 |                             13887 |                              13888 |                      7.20098e-05 |           1.4395e+09  |            1.43956e+09 |          4.23063e-05 |
| test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test              |         1.73542e+07 |          1.73559e+07 |        9.89384e-05 |                1.08127e+10 |                 1.08127e+10 |               1.69088e-06 |                             13720 |                              13728 |                      0.00058309  |           1.69636e+07 |            1.69654e+07 |          0.000108586 |
| test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test             |         1.73542e+07 |          1.73559e+07 |        9.89384e-05 |                1.08124e+10 |                 1.08124e+10 |               1.69092e-06 |                             13720 |                              13728 |                      0.00058309  |           1.69636e+07 |            1.69654e+07 |          0.000108586 |
| test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test         |         6.54343e+07 |          6.54382e+07 |        5.89141e-05 |                3.15461e+10 |                 3.15416e+10 |              -0.000142572 |                              1916 |                               1902 |                     -0.00730689  |           5.5646e+07  |            5.56494e+07 |          6.2089e-05  |
| test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test         |         1.7041e+09  |          1.70412e+09 |        1.18502e-05 |                1.50246e+11 |                 1.50246e+11 |              -4.98108e-07 |                             44229 |                              44232 |                      6.78288e-05 |           1.69819e+09 |            1.69819e+09 |          6.17128e-07 |
| test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test     |         2.26807e+08 |          2.26807e+08 |        7.93626e-07 |                1.41909e+11 |                 1.4195e+11  |               0.000287282 |                               694 |                                698 |                      0.00576369  |           2.26807e+08 |            2.26807e+08 |          7.93626e-07 |
| test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test      |         2.26807e+08 |          2.26807e+08 |        7.93626e-07 |                1.41903e+11 |                 1.41943e+11 |               0.000287296 |                               694 |                                698 |                      0.00576369  |           2.26807e+08 |            2.26807e+08 |          7.93626e-07 |
| test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test       |         1.86518e+10 |          1.86518e+10 |        1.0814e-07  |                2.47888e+11 |                 2.47888e+11 |               1.17916e-08 |                              4634 |                               4670 |                      0.00776867  |           2.62581e+07 |            2.62609e+07 |          0.000105491 |
| test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test      |         1.86518e+10 |          1.86518e+10 |        1.0814e-07  |                2.47888e+11 |                 2.47888e+11 |               1.17916e-08 |                              4634 |                               4670 |                      0.00776867  |           2.62581e+07 |            2.62609e+07 |          0.000105491 |
| test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  |    218721           |     218721           |        0           |                1.79594e+09 |                 1.79594e+09 |               0           |                              4291 |                               4291 |                      0           |      218718           |       218718           |          0           |
| test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test               |         6.19906e+07 |          6.19906e+07 |        0           |                4.72968e+10 |                 4.72968e+10 |               0           |                               301 |                                301 |                      0           |           5.57433e+07 |            5.57433e+07 |          0           |
| test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test             |         2.36169e+07 |          2.36169e+07 |        0           |                1.53744e+11 |                 1.53744e+11 |               0           |                               123 |                                123 |                      0           |           2.36169e+07 |            2.36169e+07 |          0           |
| test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test                |         6.19906e+07 |          6.19906e+07 |        0           |                4.72968e+10 |                 4.72968e+10 |               0           |                               301 |                                301 |                      0           |           5.57433e+07 |            5.57433e+07 |          0           |
| test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test               |         2.23429e+09 |          2.23429e+09 |        0           |                4.08002e+11 |                 4.08002e+11 |               0           |                               749 |                                749 |                      0           |           2.23429e+09 |            2.23429e+09 |          0           |
| test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test |    218721           |     218721           |        0           |                1.79594e+09 |                 1.79594e+09 |               0           |                              4291 |                               4291 |                      0           |      218718           |       218718           |          0           |
| test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test              |         2.23429e+09 |          2.23429e+09 |        0           |                4.08002e+11 |                 4.08002e+11 |               0           |                               749 |                                749 |                      0           |           2.23429e+09 |            2.23429e+09 |          0           |
| test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test             |         9.48185e+07 |          9.48185e+07 |        0           |                2.1198e+11  |                 2.11983e+11 |               1.13134e-05 |                              6743 |                               6744 |                      0.000148302 |           9.46089e+07 |            9.46089e+07 |          0           |
| test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test              |         2.36169e+07 |          2.36169e+07 |        0           |                1.53744e+11 |                 1.53744e+11 |               0           |                               123 |                                123 |                      0           |           2.36169e+07 |            2.36169e+07 |          0           |
| test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  |         4.64859e+08 |          4.64859e+08 |       -4.30238e-09 |                2.89851e+11 |                 2.89851e+11 |               2.99464e-09 |                              1604 |                               1604 |                      0           |           4.51635e+08 |            4.51635e+08 |         -4.42836e-09 |
| test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test |         4.64859e+08 |          4.64859e+08 |       -4.30238e-09 |                2.89851e+11 |                 2.89851e+11 |               3.40174e-09 |                              1604 |                               1604 |                      0           |           4.51635e+08 |            4.51635e+08 |         -4.42836e-09 |
| test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test |         6.96396e+08 |          6.96396e+08 |       -1.37853e-07 |                4.42779e+11 |                 4.42818e+11 |               8.90763e-05 |                               359 |                                358 |                     -0.00278552  |           6.37384e+08 |            6.37384e+08 |          0           |
| test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test  |         6.60366e+08 |          6.60366e+08 |       -1.45374e-07 |                4.10743e+11 |                 4.10779e+11 |               8.81044e-05 |                               359 |                                358 |                     -0.00278552  |           6.04725e+08 |            6.04725e+08 |          0           |
| test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test            |         5.04246e+09 |          5.02065e+09 |       -0.00432564  |                2.19801e+11 |                 2.19701e+11 |              -0.000456991 |                              1894 |                               1894 |                      0           |           1.16667e+09 |            1.1786e+09  |          0.0102294   |
| test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test           |         5.04246e+09 |          5.02065e+09 |       -0.00432564  |                2.19801e+11 |                 2.19701e+11 |              -0.000456991 |                              1894 |                               1894 |                      0           |           1.16667e+09 |            1.1786e+09  |          0.0102294   |
| Geomean difference                                                             |                 |                  |        0.0243876   |                        |                         |              -0.0181787   |                               |                                |                     -0.0099226   |                   |                    |          0.00161097  |


https://github.com/llvm/llvm-project/pull/70549


More information about the llvm-commits mailing list