[llvm] [RISCV] Move VMV0 elimination past machine SSA opts (PR #126850)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 19 01:13:21 PST 2025
lukel97 wrote:
> I see two paths forward:
>
> 1. Performance data which shows this is strongly net positive.
I haven't been able to get any dynamic data yet, but I got some static results which line up with what I was expecting. There's a significant increase in the number of copies coalesced by MachineCSE:
```
Metric: machine-cse.NumCoalesces
Program machine-cse.NumCoalesces
lhs rhs diff
538.imagick_r 1266.00 2137.00 68.8%
638.imagick_s 1266.00 2137.00 68.8%
631.deepsjeng_s 158.00 230.00 45.6%
531.deepsjeng_r 158.00 230.00 45.6%
519.lbm_r 3.00 4.00 33.3%
619.lbm_s 3.00 4.00 33.3%
511.povray_r 690.00 865.00 25.4%
526.blender_r 7891.00 8539.00 8.2%
625.x264_s 1732.00 1828.00 5.5%
525.x264_r 1732.00 1828.00 5.5%
520.omnetpp_r 614.00 640.00 4.2%
620.omnetpp_s 614.00 640.00 4.2%
544.nab_r 164.00 168.00 2.4%
644.nab_s 164.00 168.00 2.4%
641.leela_s 283.00 289.00 2.1%
541.leela_r 283.00 289.00 2.1%
500.perlbench_r 996.00 1012.00 1.6%
600.perlbench_s 996.00 1012.00 1.6%
602.gcc_s 5373.00 5443.00 1.3%
502.gcc_r 5373.00 5443.00 1.3%
510.parest_r 9374.00 9484.00 1.2%
623.xalancbmk_s 1861.00 1873.00 0.6%
523.xalancbmk_r 1861.00 1873.00 0.6%
508.namd_r 2245.00 2246.00 0.0%
605.mcf_s 34.00 34.00 0.0%
557.xz_r 181.00 181.00 0.0%
505.mcf_r 34.00 34.00 0.0%
657.xz_s 181.00 181.00 0.0%
Geomean difference 11.5%
```
povray/imagick/blender have more vector sequences shrunk into if/else blocks:
```
Metric: machine-sink.NumSunk
Program machine-sink.NumSunk
lhs rhs diff
511.povray_r 8836.00 9149.00 3.5%
538.imagick_r 30541.00 30631.00 0.3%
638.imagick_s 30541.00 30631.00 0.3%
526.blender_r 119273.00 119323.00 0.0%
...
Geomean difference 0.1%
```
MachineLICM does more hoisting:
```
Program machinelicm.NumHoisted machinelicm.NumCSEed
lhs rhs diff lhs rhs diff
FP2017rate/519.lbm_r/519.lbm_r 275.00 277.00 0.7% 28.00 28.00 0.0%
FP2017speed/619.lbm_s/619.lbm_s 280.00 282.00 0.7% 32.00 32.00 0.0%
FP2017speed/638.imagick_s/638.imagick_s 21459.00 21569.00 0.5% 10250.00 10331.00 0.8%
FP2017rate/538.imagick_r/538.imagick_r 21459.00 21569.00 0.5% 10250.00 10331.00 0.8%
FP2017rate/526.blender_r/526.blender_r 76896.00 76973.00 0.1% 30323.00 30343.00 0.1%
INT2017rate/525.x264_r/525.x264_r 2953.00 2955.00 0.1% 1179.00 1179.00 0.0%
INT2017speed/625.x264_s/625.x264_s 2953.00 2955.00 0.1% 1179.00 1179.00 0.0%
FP2017rate/510.parest_r/510.parest_r 65432.00 65464.00 0.0% 28134.00 28137.00 0.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 5670.00 5670.00 0.0% 2019.00 2019.00 0.0%
INT2017speed/602.gcc_s/602.gcc_s 61200.00 61200.00 0.0% 26398.00 26398.00 0.0%
INT2017speed/605.mcf_s/605.mcf_s 244.00 244.00 0.0% 107.00 107.00 0.0%
FP2017rate/508.namd_r/508.namd_r 2455.00 2455.00 0.0% 1384.00 1384.00 0.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s 9640.00 9640.00 0.0% 3570.00 3570.00 0.0%
INT2017rate/557.xz_r/557.xz_r 1034.00 1034.00 0.0% 345.00 345.00 0.0%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s 1305.00 1305.00 0.0% 426.00 426.00 0.0%
Geomean difference 0.1% 0.1%
```
There's negligible impact on spills/reloads:
```
Program regalloc.NumSpills regalloc.NumReloads
lhs rhs diff lhs rhs diff
FP2017rate/538.imagick_r/538.imagick_r 4124.00 4125.00 0.0% 10418.00 10396.00 -0.2%
FP2017speed/638.imagick_s/638.imagick_s 4124.00 4125.00 0.0% 10418.00 10396.00 -0.2%
FP2017rate/526.blender_r/526.blender_r 13483.00 13484.00 0.0% 27498.00 27499.00 0.0%
FP2017rate/508.namd_r/508.namd_r 6712.00 6712.00 0.0% 16664.00 16664.00 0.0%
INT2017rate/525.x264_r/525.x264_r 2214.00 2214.00 0.0% 4630.00 4630.00 0.0%
INT2017speed/641.leela_s/641.leela_s 319.00 319.00 0.0% 484.00 484.00 0.0%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s 355.00 355.00 0.0% 711.00 711.00 0.0%
INT2017speed/625.x264_s/625.x264_s 2214.00 2214.00 0.0% 4630.00 4630.00 0.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s 1816.00 1816.00 0.0% 2982.00 2982.00 0.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 721.00 721.00 0.0% 1212.00 1212.00 0.0%
INT2017speed/605.mcf_s/605.mcf_s 124.00 124.00 0.0% 373.00 373.00 0.0%
INT2017spe...00.perlbench_s/600.perlbench_s 4389.00 4389.00 0.0% 9753.00 9753.00 0.0%
INT2017rate/557.xz_r/557.xz_r 317.00 317.00 0.0% 611.00 611.00 0.0%
INT2017rate/541.leela_r/541.leela_r 319.00 319.00 0.0% 484.00 484.00 0.0%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r 355.00 355.00 0.0% 711.00 711.00 0.0%
Geomean difference -0.0% -0.0%
```
> Investigating the regalloc cases, and fixing them in separate patches. In particular, the spill/reload with no need to actually spill seems... weird.
I was able to fix the spills (in this diff) by changing the $V0 COPYs to VMV0 COPYs, but with the big caveat that it required fixing the earlyclobber constraint we have on pseudos required for register group overlap rules. At the risk of going down a rabbit hole here I'll open up a draft PR to show the idea. I'm not sure if it will be worth pursuing, but it is possible anyway.
https://github.com/llvm/llvm-project/pull/126850
More information about the llvm-commits
mailing list