[llvm] [RISCV] Move VMV0 elimination past machine SSA opts (PR #126850)

Luke Lau via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 19 01:13:21 PST 2025


lukel97 wrote:

> I see two paths forward:
> 
> 1. Performance data which shows this is strongly net positive.

I haven't been able to get any dynamic data yet, but I got some static results which line up with what I was expecting. There's a significant increase in the number of copies coalesced by MachineCSE:

```
Metric: machine-cse.NumCoalesces

Program            machine-cse.NumCoalesces              
                   lhs                      rhs     diff 
     538.imagick_r 1266.00                  2137.00 68.8%
     638.imagick_s 1266.00                  2137.00 68.8%
   631.deepsjeng_s  158.00                   230.00 45.6%
   531.deepsjeng_r  158.00                   230.00 45.6%
         519.lbm_r    3.00                     4.00 33.3%
         619.lbm_s    3.00                     4.00 33.3%
      511.povray_r  690.00                   865.00 25.4%
     526.blender_r 7891.00                  8539.00  8.2%
        625.x264_s 1732.00                  1828.00  5.5%
        525.x264_r 1732.00                  1828.00  5.5%
     520.omnetpp_r  614.00                   640.00  4.2%
     620.omnetpp_s  614.00                   640.00  4.2%
         544.nab_r  164.00                   168.00  2.4%
         644.nab_s  164.00                   168.00  2.4%
       641.leela_s  283.00                   289.00  2.1%
       541.leela_r  283.00                   289.00  2.1%
   500.perlbench_r  996.00                  1012.00  1.6%
   600.perlbench_s  996.00                  1012.00  1.6%
         602.gcc_s 5373.00                  5443.00  1.3%
         502.gcc_r 5373.00                  5443.00  1.3%
      510.parest_r 9374.00                  9484.00  1.2%
   623.xalancbmk_s 1861.00                  1873.00  0.6%
   523.xalancbmk_r 1861.00                  1873.00  0.6%
        508.namd_r 2245.00                  2246.00  0.0%
         605.mcf_s   34.00                    34.00  0.0%
          557.xz_r  181.00                   181.00  0.0%
         505.mcf_r   34.00                    34.00  0.0%
          657.xz_s  181.00                   181.00  0.0%
Geomean difference                                  11.5%
```

povray/imagick/blender have more vector sequences shrunk into if/else blocks:

```
Metric: machine-sink.NumSunk

Program            machine-sink.NumSunk                
                   lhs                  rhs       diff 
      511.povray_r   8836.00              9149.00  3.5%
     538.imagick_r  30541.00             30631.00  0.3%
     638.imagick_s  30541.00             30631.00  0.3%
     526.blender_r 119273.00            119323.00  0.0%
    ...
Geomean difference                                 0.1%
```

MachineLICM does more hoisting:

```
Program                                       machinelicm.NumHoisted                machinelicm.NumCSEed               
                                              lhs                    rhs      diff  lhs                  rhs      diff 
FP2017rate/519.lbm_r/519.lbm_r                  275.00                 277.00  0.7%    28.00                28.00  0.0%
FP2017speed/619.lbm_s/619.lbm_s                 280.00                 282.00  0.7%    32.00                32.00  0.0%
FP2017speed/638.imagick_s/638.imagick_s       21459.00               21569.00  0.5% 10250.00             10331.00  0.8%
FP2017rate/538.imagick_r/538.imagick_r        21459.00               21569.00  0.5% 10250.00             10331.00  0.8%
FP2017rate/526.blender_r/526.blender_r        76896.00               76973.00  0.1% 30323.00             30343.00  0.1%
INT2017rate/525.x264_r/525.x264_r              2953.00                2955.00  0.1%  1179.00              1179.00  0.0%
INT2017speed/625.x264_s/625.x264_s             2953.00                2955.00  0.1%  1179.00              1179.00  0.0%
FP2017rate/510.parest_r/510.parest_r          65432.00               65464.00  0.0% 28134.00             28137.00  0.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s    5670.00                5670.00  0.0%  2019.00              2019.00  0.0%
INT2017speed/602.gcc_s/602.gcc_s              61200.00               61200.00  0.0% 26398.00             26398.00  0.0%
INT2017speed/605.mcf_s/605.mcf_s                244.00                 244.00  0.0%   107.00               107.00  0.0%
FP2017rate/508.namd_r/508.namd_r               2455.00                2455.00  0.0%  1384.00              1384.00  0.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s    9640.00                9640.00  0.0%  3570.00              3570.00  0.0%
INT2017rate/557.xz_r/557.xz_r                  1034.00                1034.00  0.0%   345.00               345.00  0.0%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s    1305.00                1305.00  0.0%   426.00               426.00  0.0%
                           Geomean difference                                  0.1%                                0.1%
```

There's negligible impact on spills/reloads:

```
Program                                       regalloc.NumSpills                regalloc.NumReloads               
                                              lhs                rhs      diff  lhs                 rhs      diff 
FP2017rate/538.imagick_r/538.imagick_r         4124.00            4125.00  0.0% 10418.00            10396.00 -0.2%
FP2017speed/638.imagick_s/638.imagick_s        4124.00            4125.00  0.0% 10418.00            10396.00 -0.2%
FP2017rate/526.blender_r/526.blender_r        13483.00           13484.00  0.0% 27498.00            27499.00  0.0%
FP2017rate/508.namd_r/508.namd_r               6712.00            6712.00  0.0% 16664.00            16664.00  0.0%
INT2017rate/525.x264_r/525.x264_r              2214.00            2214.00  0.0%  4630.00             4630.00  0.0%
INT2017speed/641.leela_s/641.leela_s            319.00             319.00  0.0%   484.00              484.00  0.0%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s     355.00             355.00  0.0%   711.00              711.00  0.0%
INT2017speed/625.x264_s/625.x264_s             2214.00            2214.00  0.0%  4630.00             4630.00  0.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s    1816.00            1816.00  0.0%  2982.00             2982.00  0.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s     721.00             721.00  0.0%  1212.00             1212.00  0.0%
INT2017speed/605.mcf_s/605.mcf_s                124.00             124.00  0.0%   373.00              373.00  0.0%
INT2017spe...00.perlbench_s/600.perlbench_s    4389.00            4389.00  0.0%  9753.00             9753.00  0.0%
INT2017rate/557.xz_r/557.xz_r                   317.00             317.00  0.0%   611.00              611.00  0.0%
INT2017rate/541.leela_r/541.leela_r             319.00             319.00  0.0%   484.00              484.00  0.0%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r     355.00             355.00  0.0%   711.00              711.00  0.0%
                           Geomean difference                             -0.0%                              -0.0%
```

> Investigating the regalloc cases, and fixing them in separate patches. In particular, the spill/reload with no need to actually spill seems... weird.

I was able to fix the spills (in this diff) by changing the $V0 COPYs to VMV0 COPYs, but with the big caveat that it required fixing the earlyclobber constraint we have on pseudos required for register group overlap rules. At the risk of going down a rabbit hole here I'll open up a draft PR to show the idea. I'm not sure if it will be worth pursuing, but it is possible anyway.

https://github.com/llvm/llvm-project/pull/126850


More information about the llvm-commits mailing list