<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/126594>126594</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISCV] Enable IPRA for RISC-V
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
mikhailramalho
</td>
</tr>
</table>
<pre>
## Overview
This is a tracking issue for the enablement of IPRA (interprocedural register allocation) for the RISC-V backend. Enabling IPRA by default is of course predicated on validating both the correctness and the observed performance increase.
## Status and tracking
IPRA can be enabled experimentally for RISC-V by passing `-mllvm -enable-ipra -mllvm -enable-machine-outliner=never` as flags to Clang HEAD:
* [x] Implement getIPRACSRegs hook to fix miscompile due to missed save/restore of `ra`. #125586 (see also #124932)
* Note: it predated this investigation, but [Craig previously found and fixed another bug exposed by IPRA](https://github.com/llvm/llvm-project/commit/b0f11dfc7506dd33ad5b43be9faba919b70d1959)
* [ ] Resolve negative interaction between machine-outliner and IPRA #119556
* `-enable-machine-outliner=never` is a usable but temporary workaround.
* As noted in the linked issue, the same problem exists for other targets that enable the MachineOutliner, such as AArch64.
* [x] Ensure SPEC and the llvm-test-suite passes with IPRA enabled
* [x] Collect static data on the impact of enabling IPRA (e.g. number of functions optimized) for SPEC or other benchmarks.
* [ ] Collect runtime data on the impact of enabling IPRA for SPEC or other benchmarks.
* Execution time on SpacemiT and instruction count.
* Compare against a baseline that unconditionally does `setRequiresCodeGenSCCOrder()` in TargetPassConfig to exclude differences due solely to the reordering of code due to that option.
* [ ] Investigate any performance anomalies or regressions
* [ ] (Optionally) Explore whether there is any pessimisation that prevents IPRA from being used on functions that should be able to use it.
* [ ] Benchmark data allowing, propose IPRA be enabled by default for RISC-V
IPRA is enabled by default for AMDGPU but not other targets. There was a [brief discussion on enabling for X86](https://github.com/llvm/llvm-project/pull/109597#issuecomment-2375566477).
## Data
### Static Analysis
* The following table shows the NumCSROpt (Number of functions optimized for callee saved registers), as reported by RegUsageInfoCollector, on SPEC built with `-march=rva22u64_v`+ipra+lto.
* Data from rva22u64_v and rva23u64 are the same.
* Enabling LTO substantially increases the scope for IPRA to take place: geomean shows a 215.6% increase in NumCSROpt when lto is enabled (see appendix).
* Without LTO, IPRA is used at least 1 time in 19 SPEC benchmarks and with LTO in 26 SPEC benchmarks, out of 32 benchmarks in total.
* Adding `-fno-semantic-interposition` does not affect the static data.
```
$ ./utils/compare.py -a -m ip-regalloc.NumCSROpt rva22_v_ipra_flto.json
Tests: 32
Metric: ip-regalloc.NumCSROpt
Program ip-regalloc.NumCSROpt
rva22_v_ipra_flto
FP2017rate/526.blender_r/526.blender_r 239.00
INT2017speed/602.gcc_s/602.gcc_s 61.00
INT2017rate/502.gcc_r/502.gcc_r 61.00
FP2017rate/510.parest_r/510.parest_r 48.00
FP2017rate/511.povray_r/511.povray_r 41.00
INT2017rat...23.xalancbmk_r/523.xalancbmk_r 24.00
INT2017spe...23.xalancbmk_s/623.xalancbmk_s 24.00
FP2017rate/538.imagick_r/538.imagick_r 23.00
FP2017speed/638.imagick_s/638.imagick_s 23.00
INT2017rat...00.perlbench_r/500.perlbench_r 21.00
INT2017spe...00.perlbench_s/600.perlbench_s 21.00
INT2017speed/641.leela_s/641.leela_s 12.00
INT2017rate/541.leela_r/541.leela_r 12.00
INT2017rat...31.deepsjeng_r/531.deepsjeng_r 11.00
INT2017spe...31.deepsjeng_s/631.deepsjeng_s 11.00
FP2017rate/508.namd_r/508.namd_r 9.00
INT2017rate/520.omnetpp_r/520.omnetpp_r 6.00
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 6.00
INT2017rate/525.x264_r/525.x264_r 5.00
INT2017speed/625.x264_s/625.x264_s 5.00
INT2017rate/557.xz_r/557.xz_r 3.00
INT2017speed/657.xz_s/657.xz_s 3.00
FP2017speed/619.lbm_s/619.lbm_s 2.00
FP2017rate/519.lbm_r/519.lbm_r 2.00
FP2017speed/644.nab_s/644.nab_s 1.00
FP2017rate/544.nab_r/544.nab_r 1.00
FP2017rate...97.specrand_fr/997.specrand_fr
FP2017spee...96.specrand_fs/996.specrand_fs
INT2017rate/505.mcf_r/505.mcf_r
INT2017rat...99.specrand_ir/999.specrand_ir
INT2017speed/605.mcf_s/605.mcf_s
INT2017spe...98.specrand_is/998.specrand_is
ip-regalloc.NumCSROpt
run rva22_v_ipra_flto
count 26.000000
mean 25.961538
std 46.851664
min 1.000000
25% 5.000000
50% 11.500000
75% 23.750000
max 239.000000
```
### Dynamic Runtime Data
The static count of functions optimised is helpful for understanding the degree to which IPRA changes code generation at all, but runtime performance data is needed to evaluate the extent to which it helps performance.
I'm currently regenerating the data with commit 83fa117f76f9c4c82ce0ca914c4eba268c6c2fa2 and will upload it here asap.
### Appendix: separate compilation vs -flto
```
$ ./utils/compare.py -a rva22_v_ipra.json vs rva22_v_ipra_flto.json -m ip-regalloc.NumCSROpt
Tests: 32
Metric: ip-regalloc.NumCSROpt
Program ip-regalloc.NumCSROpt
lhs rhs diff
FP2017rate/508.namd_r/508.namd_r 0.00 9.00 inf%
INT2017rate/520.omnetpp_r/520.omnetpp_r 0.00 6.00 inf%
INT2017speed/641.leela_s/641.leela_s 0.00 12.00 inf%
FP2017rate/519.lbm_r/519.lbm_r 0.00 2.00 inf%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 0.00 6.00 inf%
INT2017rate/541.leela_r/541.leela_r 0.00 12.00 inf%
FP2017speed/619.lbm_s/619.lbm_s 0.00 2.00 inf%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s 1.00 24.00 2300.0%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r 1.00 24.00 2300.0%
FP2017rate/510.parest_r/510.parest_r 6.00 48.00 700.0%
FP2017rate/511.povray_r/511.povray_r 6.00 41.00 583.3%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r 2.00 11.00 450.0%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s 2.00 11.00 450.0%
FP2017rate/526.blender_r/526.blender_r 67.00 239.00 256.7%
INT2017rate/557.xz_r/557.xz_r 1.00 3.00 200.0%
INT2017speed/657.xz_s/657.xz_s 1.00 3.00 200.0%
INT2017speed/602.gcc_s/602.gcc_s 24.00 61.00 154.2%
INT2017rate/502.gcc_r/502.gcc_r 24.00 61.00 154.2%
FP2017rate/538.imagick_r/538.imagick_r 11.00 23.00 109.1%
FP2017speed/638.imagick_s/638.imagick_s 11.00 23.00 109.1%
INT2017rat...00.perlbench_r/500.perlbench_r 14.00 21.00 50.0%
INT2017spe...00.perlbench_s/600.perlbench_s 14.00 21.00 50.0%
INT2017rate/525.x264_r/525.x264_r 4.00 5.00 25.0%
INT2017speed/625.x264_s/625.x264_s 4.00 5.00 25.0%
FP2017rate/544.nab_r/544.nab_r 1.00 1.00 0.0%
FP2017speed/644.nab_s/644.nab_s 1.00 1.00 0.0%
FP2017rate...97.specrand_fr/997.specrand_fr 0.00 0.00
FP2017spee...96.specrand_fs/996.specrand_fs 0.00 0.00
INT2017rate/505.mcf_r/505.mcf_r 0.00 0.00
INT2017rat...99.specrand_ir/999.specrand_ir 0.00 0.00
INT2017speed/605.mcf_s/605.mcf_s 0.00 0.00
INT2017spe...98.specrand_is/998.specrand_is 0.00 0.00
Geomean difference 215.6%
l/r lhs rhs diff
count 32.000000 32.000000 26.000000
mean 6.093750 21.093750 inf
std 12.957398 43.315318 NaN
min 0.000000 0.000000 0.000000
25% 0.000000 1.750000 1.090909
50% 1.000000 7.500000 3.533582
75% 6.000000 23.000000 NaN
max 67.000000 239.000000 inf
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0Wl9zo7iW_zTkRRUKhMHmIQ9upzPbVTvdXUnu3X3rEnCwNRESKwknmU-_dSSwwcYZd6quZ6paEjpHR7_zH8KM4VsJcBekX4L0_oZ1dqf0XcNfdowLzRomduqmUNX7XUCTgCbkxx70nsNrEK2DaP2844ZwQxixmpUvXG4JN6YDUitN7A4ISFYIaEBaomry7efjmgR0xaUF3WpVQtVpJoiGLTcWNGFCqJJZrmRA8wOTx29Pm9t_k4KVLyCrkHxFpniW41e8kwpq1gmLkqialKrTBkiroeIls1ARJcmeCV4xi1SFsjvHt1RaQ2klGEOYrNyaKgzoPVSkBV0r3TBZAuGy1MAMhP7WPRRPltmup-xv7587sUomSTEAUBF4a0FzBIIJ8e6uNlzrnbSoB7klQRbdNkLsG3Lr6W55qxk5WWtYueMSblVnBZegg-Rewh50kEWEGVILtjXEKrIRTG7Jf31d3wfJepB8TYL0y1uQ3pNvTdtrZgsWRd48PcLWkJ1SL0he8zfScFOqpuUCSNUBrjbcGKiIYXsI6IMGY5UGRD3IIs2CLApJQJOYpukqQ1UbAMKEUX51kSc0oHkQrQlBWb4rC0GyJtw6dTllWWdTcg_G8m1vCxtSdBYl32jGt7h3z1VnHJCdrJwOav4GOFJ2B5oU3RYxVyhs8e4sJUjvA7raWdsaBIQ-BPRhy-2uK8JSNQF9QJT7f25brf6C0gb0oVRNw3FQRHUcV3W5TKOsqpKEVWmxSArIa1awPM6LZVTFeZr7C_ZQE4T6EYwSeyAS8EJ7NCgLmpV4OVKAfQWQ5FSt7lK9xyRxnKdphrg5vll0jTE4x-wM7nP4WWhapZl-J69KvzCN0IUHXawNkQoVwKXzBMHlC87Qn1EDuGZYg46l0KcJvHFjjTNlj7llegvWELtjtjd8R_Wnl_HHICLdENOVOzTW9VqXu2wRntjmV2k6DeTp59fNwTOdWiwYe2s6bsE5DRjyyu3Ow9S72gmrjRICSkuMZZaXpGKWYTxAjrxpWekCE0wiSkBXEG5DIrumAI3P6046ZRmiWssb_jdUQ4ByMh4gKECWu4bpFxOe2MAgh-6k5Q1cJcg_8feK-_oGZedMyTFWkjy1rISGPzvsuDRWd97WStVJe6TcqKZlGgjbMtxFGCmYAVSS12EnSyUrjqQualUKDBqfAfsI_9dxDWajKvgD5NNm80NXqNsVmj8anyTPzh5-MmM2StZ8i_ED3krRVUAqXtegQZZgXGgxSoB4xx0IiAaF3BAIF9CrQ_xxcqESlDxF-NshZgBh8n0SwZlUDRMcDEKpYavBGNTnCYuArn60w3VRw1_fWoEB7nUH3sR3oME5ljvAGN5wwzz4KBlGJpDW9PrTqiEF4DU64_PQ0ZLcfrNTnagwUXhvUbiRcHt6ty-D3r3dYJ58xXxDN-iOGOb6ZHjMOKO8eMw2owTFzaWt6z_v__j5LxcypLJT5w7Js4PglWFwCdIvheZQk4qbsnOQ4iUPVozc_neVfTLytp0QAX2IozzNlwFNXCjCcAzS3tJkmaZZtlguA5pPE_M9s2y8MCRrXpK1ZOLdcHNMh887rFZ6PIl1ajA79WqcIX7vms3T44_Woml8_ygeuLuWTAgAlx2rQ1Vj0CPoBsOdhlZp6xF_hO2_DNvCN1mrPjgoFxvRgdHpi44L6-ObKwyYLndBcq_3jNIuW_zaB1kU0C9YIgT0i7AqJP2dEABvfMfNLhbgNOmyBUGvHwL6gexQV_338w9iusJYJi13nj9UQB4VU6rW13jOktAt2QuQVrDSpfMtqAaY7HFkhMZpmAU0PbDB6HCE9nUHkgirxiY5lA5tC7Lib4OO6Zr8D7c71VkUEtEabNk5GLNEADOWxD4WcknivEfzEDkdEg5WvCeXhGanW5waOheREzomxdyoLBODMOuqGgq3WqpbAw1CVt76ClcZFzsxHLrQid7E6hrTgIPxmJIGA86i_n9kvyBhQB86y4XxdQgG67B9J7dYExLe3mrYupI5PILpNP5r_wvN4leNVvGXURJV_AzGov-RhAbR-k-wmpeu-Jrj4-X5qdVWswtHYRo5O40Qgmc9_KRRvNTMYpWY0iwsBMgK9C99Oic0ycMoItMfBqnvz8jDtIDJ9iGLaLgty19mPD6hymLkdKQdzu-36_GYXKKdih5HIaJurKceTUe0i9U8bRy2aq_Ze097nI5pz2UOw5Am4RsTTJZF89JjNllBSrqYUpoWTikdWNOVCeVU3mQV8oZtedkfOZ4f5KXJhPagntFmczo_o53cNYrCFrRwXtbraLLiKOO5u072ecOYrFyg9AIv4lAACOYJj7ORbmI6b0-HzXo6Ix_ThmGYxGEF0Jq_QG57kCcrjnL2rpN9HuHJyoRyqtdoFUrWVD22w-TUAfL5u9IoVI0E27a9HY7mR-eZk9jDPNpvTudztIdz0_CNZov-0GFyInN6QbfDfjOZfEg7nJsuw7e__an98BQoQkhy4VxPYUbDy7Qn_hPnoSgaTzuMz2jpfJzx2_V4fH4unT93sQglK3on6MdntPN21W_X4_H5uee0YRjmy9C0UGomq181MsinK1NBkSAbPTaOYLIyF_XTsCnr3ur78ZlD5vmRC_eCTFZmM5FnZsbjM-vPVyM2XuDJyoxVuR9m1UvpVney33WWd4No7Zo7Qig6FP6QlSvDCKFpmGdxmqymBxlb4WiRhas0zrLF9GnD5aC_gR1NsYrrncctTknSqH8ex2F6uiGI1ktHT5NwmZ4_bNibs1NXD4wfnxZG47r-_l2yhpfksW-rj7X_87HC8sjM1O3GveAgOxBt3QlX0XZYlmDl6wo7LNMqbBddb_a642X_sqHcMbkF43vTLUjQvgtkFpuz4Y3V0OyPW1HXwHFDJEAFleuJ90x02LW6N6ZvFqQ9nsatE8-MefTV4reALhtSdlqDtOIdG45ekEFyPMpVuv4lFlklNYvjZb3M6rxclCtaQlSyPF6UCygYzVZlVtKa0b5EFoJ0rVCs8lJoIMywNjzVwXoo0pM1MdAydEDi3xt6UPaG3PZGen2FOzZxX8PuzaXq9lI1_B-oemfcVezmvFn71YrX9WdTcnReGA-Jmss6oOkn07Xjm_XMTzn9ToHkOflSZ8zqM_nJsTrj9LsVxTxkl-76jwXdLLtL9_04l89LRueUeX0l77Orr-dpEkVhdHq_K5uJS4w-0Q15tH1PRJYXWV1ujrI5pHyflK6SMJm54jV1tcfa18hkkZ5idX2ZfYnRbzS9B6iWHnDfAtM0C5ezBjpTlcaz9pR4RmeW8FGJ-klOsw25t6Az_3NHxOkipLPXu6ZB95w9pzGrT_SwXnW-GyVxlIfxrBtf09JeYvV7HW48i5pvXS8Y6jW9r2dLe_2eMfqtVsuxSj0nml6yiWsar3lOV7QW83Y6rJ644ccNzgesZtz5moaFDClsHOd_u4u5xOWa1ub09zGfa3qf-ZQ1z_Lj7ugDTp9qnD7md_7k8PujfzV9_P50fDs98wuitQjow9w7ADKp-7DaQ57HPiyhMz3SZPXQqR0btSyM8mSZeqf1I1dmHKTpW7aYhnm6TPLVhPkiCZM4TeIVId_Z91EPF82JMlmN5rq8w2I8athQMPxv2u_NHrAcmsAkTJMkXdFjD9iXCec0LpT6ZbzD4eLYHbpseUYybhjHUB1ajZvqLqnyJGc3cBcvk5zGeZ7Qm91dlmdRCdmqKnOapRAvkgWt6oyWSZ4mVZ3f8Dsa0TSicUSjZJXSsK5SWsawyGqI6uUqDxYRNIyLUIh9Eyq9vXEfqe5imqX54kawAoRxf11DqYTX4Ws6DdL7G33nvncV3dYEi0hwY82RjeVWuD_Lefz2tPm3_ybuvk0dPgz7D3o3nRZ3v_1lzYmBvtXLub-j_x8AAP__7oJdSA">