<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/126594>126594</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [RISCV] Enable IPRA for RISC-V
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          mikhailramalho
      </td>
    </tr>
</table>

<pre>
    ## Overview

This is a tracking issue for the enablement of IPRA (interprocedural register allocation) for the RISC-V backend. Enabling IPRA by default is of course predicated on validating both the correctness and the observed performance increase.

## Status and tracking

IPRA can be enabled experimentally for RISC-V by passing `-mllvm -enable-ipra -mllvm -enable-machine-outliner=never` as flags to Clang HEAD:

* [x] Implement getIPRACSRegs hook to fix miscompile due to missed save/restore of `ra`. #125586 (see also #124932)
  * Note: it predated this investigation, but [Craig previously found and fixed another bug exposed by IPRA](https://github.com/llvm/llvm-project/commit/b0f11dfc7506dd33ad5b43be9faba919b70d1959)
* [ ] Resolve negative interaction between machine-outliner and IPRA #119556
 * `-enable-machine-outliner=never` is a usable but temporary workaround.
  * As noted in the linked issue, the same problem exists for other targets that enable the MachineOutliner, such as AArch64.
* [x] Ensure SPEC and the llvm-test-suite passes with IPRA enabled
* [x] Collect static data on the impact of enabling IPRA (e.g. number of functions optimized) for SPEC or other benchmarks.
* [ ] Collect runtime data on the impact of enabling IPRA for SPEC or other benchmarks.
  * Execution time on SpacemiT and instruction count.
  * Compare against a baseline that unconditionally does `setRequiresCodeGenSCCOrder()` in TargetPassConfig to exclude differences due solely to the reordering of code due to that option.
* [ ] Investigate any performance anomalies or regressions
* [ ] (Optionally) Explore whether there is any pessimisation that prevents IPRA from being used on functions that should be able to use it.
* [ ] Benchmark data allowing, propose IPRA be enabled by default for RISC-V

IPRA is enabled by default for AMDGPU but not other targets. There was a [brief discussion on enabling for X86](https://github.com/llvm/llvm-project/pull/109597#issuecomment-2375566477).

## Data

### Static Analysis

* The following table shows the NumCSROpt (Number of functions optimized for callee saved registers), as reported by RegUsageInfoCollector, on SPEC built with `-march=rva22u64_v`+ipra+lto. 
* Data from rva22u64_v and rva23u64 are the same. 
* Enabling LTO substantially increases the scope for IPRA to take place: geomean shows a 215.6% increase in NumCSROpt when lto is enabled (see appendix).
* Without LTO, IPRA is used at least 1 time in 19 SPEC benchmarks and with LTO in 26 SPEC benchmarks, out of 32 benchmarks in total.
* Adding `-fno-semantic-interposition` does not affect the static data.

```
$ ./utils/compare.py -a -m ip-regalloc.NumCSROpt rva22_v_ipra_flto.json 
Tests: 32
Metric: ip-regalloc.NumCSROpt

Program ip-regalloc.NumCSROpt
 rva22_v_ipra_flto    
FP2017rate/526.blender_r/526.blender_r 239.00               
INT2017speed/602.gcc_s/602.gcc_s               61.00 
INT2017rate/502.gcc_r/502.gcc_r                61.00 
FP2017rate/510.parest_r/510.parest_r           48.00 
FP2017rate/511.povray_r/511.povray_r           41.00 
INT2017rat...23.xalancbmk_r/523.xalancbmk_r    24.00 
INT2017spe...23.xalancbmk_s/623.xalancbmk_s    24.00 
FP2017rate/538.imagick_r/538.imagick_r         23.00 
FP2017speed/638.imagick_s/638.imagick_s        23.00 
INT2017rat...00.perlbench_r/500.perlbench_r    21.00 
INT2017spe...00.perlbench_s/600.perlbench_s    21.00 
INT2017speed/641.leela_s/641.leela_s           12.00 
INT2017rate/541.leela_r/541.leela_r            12.00 
INT2017rat...31.deepsjeng_r/531.deepsjeng_r    11.00 
INT2017spe...31.deepsjeng_s/631.deepsjeng_s    11.00 
FP2017rate/508.namd_r/508.namd_r                9.00 
INT2017rate/520.omnetpp_r/520.omnetpp_r         6.00 
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s     6.00 
INT2017rate/525.x264_r/525.x264_r               5.00 
INT2017speed/625.x264_s/625.x264_s              5.00 
INT2017rate/557.xz_r/557.xz_r                   3.00 
INT2017speed/657.xz_s/657.xz_s                  3.00 
FP2017speed/619.lbm_s/619.lbm_s                 2.00 
FP2017rate/519.lbm_r/519.lbm_r                  2.00 
FP2017speed/644.nab_s/644.nab_s                 1.00 
FP2017rate/544.nab_r/544.nab_r                  1.00 
FP2017rate...97.specrand_fr/997.specrand_fr 
FP2017spee...96.specrand_fs/996.specrand_fs 
INT2017rate/505.mcf_r/505.mcf_r 
INT2017rat...99.specrand_ir/999.specrand_ir 
INT2017speed/605.mcf_s/605.mcf_s 
INT2017spe...98.specrand_is/998.specrand_is                        
 ip-regalloc.NumCSROpt
run       rva22_v_ipra_flto
count  26.000000 
mean   25.961538           
std    46.851664           
min    1.000000 
25%    5.000000            
50%    11.500000           
75% 23.750000           
max    239.000000        
```

### Dynamic Runtime Data

The static count of functions optimised is helpful for understanding the degree to which IPRA changes code generation at all, but runtime performance data is needed to evaluate the extent to which it helps performance.

I'm currently regenerating the data with commit 83fa117f76f9c4c82ce0ca914c4eba268c6c2fa2 and will upload it here asap.

### Appendix: separate compilation vs -flto
```
$ ./utils/compare.py -a rva22_v_ipra.json vs rva22_v_ipra_flto.json -m ip-regalloc.NumCSROpt
Tests: 32
Metric: ip-regalloc.NumCSROpt

Program ip-regalloc.NumCSROpt               
 lhs                   rhs    diff 
FP2017rate/508.namd_r/508.namd_r                0.00                  9.00 inf%
INT2017rate/520.omnetpp_r/520.omnetpp_r         0.00 6.00    inf%
INT2017speed/641.leela_s/641.leela_s            0.00 12.00    inf%
FP2017rate/519.lbm_r/519.lbm_r                  0.00 2.00    inf%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 0.00                  6.00    inf%
INT2017rate/541.leela_r/541.leela_r 0.00                 12.00    inf%
FP2017speed/619.lbm_s/619.lbm_s 0.00                  2.00 inf%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s     1.00 24.00 2300.0%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r     1.00 24.00 2300.0%
FP2017rate/510.parest_r/510.parest_r            6.00 48.00  700.0%
FP2017rate/511.povray_r/511.povray_r 6.00                 41.00 583.3%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r     2.00 11.00  450.0%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s     2.00 11.00  450.0%
FP2017rate/526.blender_r/526.blender_r         67.00 239.00  256.7%
INT2017rate/557.xz_r/557.xz_r 1.00                  3.00  200.0%
INT2017speed/657.xz_s/657.xz_s 1.00                  3.00  200.0%
INT2017speed/602.gcc_s/602.gcc_s 24.00                 61.00 154.2%
INT2017rate/502.gcc_r/502.gcc_r                24.00 61.00  154.2%
FP2017rate/538.imagick_r/538.imagick_r         11.00 23.00  109.1%
FP2017speed/638.imagick_s/638.imagick_s        11.00 23.00  109.1%
INT2017rat...00.perlbench_r/500.perlbench_r 14.00                 21.00 50.0%
INT2017spe...00.perlbench_s/600.perlbench_s    14.00 21.00   50.0%
INT2017rate/525.x264_r/525.x264_r               4.00 5.00   25.0%
INT2017speed/625.x264_s/625.x264_s              4.00 5.00   25.0%
FP2017rate/544.nab_r/544.nab_r 1.00                  1.00    0.0%
FP2017speed/644.nab_s/644.nab_s 1.00                  1.00 0.0%
FP2017rate...97.specrand_fr/997.specrand_fr     0.00 0.00        
FP2017spee...96.specrand_fs/996.specrand_fs     0.00 0.00        
INT2017rate/505.mcf_r/505.mcf_r                 0.00 0.00        
INT2017rat...99.specrand_ir/999.specrand_ir 0.00                  0.00        
INT2017speed/605.mcf_s/605.mcf_s 0.00                  0.00 
INT2017spe...98.specrand_is/998.specrand_is     0.00                  0.00 
                           Geomean difference 215.6%                    
l/r                     lhs         rhs diff
count  32.000000             32.000000   26.000000
mean   6.093750 21.093750   inf      
std    12.957398             43.315318  NaN 
min    0.000000              0.000000    0.000000 
25%    0.000000 1.750000    1.090909 
50%    1.000000              7.500000 3.533582 
75%    6.000000              23.000000  NaN       
max 67.000000             239.000000  inf      
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0Wl9zo7iW_zTkRRUKhMHmIQ9upzPbVTvdXUnu3X3rEnCwNRESKwknmU-_dSSwwcYZd6quZ6paEjpHR7_zH8KM4VsJcBekX4L0_oZ1dqf0XcNfdowLzRomduqmUNX7XUCTgCbkxx70nsNrEK2DaP2844ZwQxixmpUvXG4JN6YDUitN7A4ISFYIaEBaomry7efjmgR0xaUF3WpVQtVpJoiGLTcWNGFCqJJZrmRA8wOTx29Pm9t_k4KVLyCrkHxFpniW41e8kwpq1gmLkqialKrTBkiroeIls1ARJcmeCV4xi1SFsjvHt1RaQ2klGEOYrNyaKgzoPVSkBV0r3TBZAuGy1MAMhP7WPRRPltmup-xv7587sUomSTEAUBF4a0FzBIIJ8e6uNlzrnbSoB7klQRbdNkLsG3Lr6W55qxk5WWtYueMSblVnBZegg-Rewh50kEWEGVILtjXEKrIRTG7Jf31d3wfJepB8TYL0y1uQ3pNvTdtrZgsWRd48PcLWkJ1SL0he8zfScFOqpuUCSNUBrjbcGKiIYXsI6IMGY5UGRD3IIs2CLApJQJOYpukqQ1UbAMKEUX51kSc0oHkQrQlBWb4rC0GyJtw6dTllWWdTcg_G8m1vCxtSdBYl32jGt7h3z1VnHJCdrJwOav4GOFJ2B5oU3RYxVyhs8e4sJUjvA7raWdsaBIQ-BPRhy-2uK8JSNQF9QJT7f25brf6C0gb0oVRNw3FQRHUcV3W5TKOsqpKEVWmxSArIa1awPM6LZVTFeZr7C_ZQE4T6EYwSeyAS8EJ7NCgLmpV4OVKAfQWQ5FSt7lK9xyRxnKdphrg5vll0jTE4x-wM7nP4WWhapZl-J69KvzCN0IUHXawNkQoVwKXzBMHlC87Qn1EDuGZYg46l0KcJvHFjjTNlj7llegvWELtjtjd8R_Wnl_HHICLdENOVOzTW9VqXu2wRntjmV2k6DeTp59fNwTOdWiwYe2s6bsE5DRjyyu3Ow9S72gmrjRICSkuMZZaXpGKWYTxAjrxpWekCE0wiSkBXEG5DIrumAI3P6046ZRmiWssb_jdUQ4ByMh4gKECWu4bpFxOe2MAgh-6k5Q1cJcg_8feK-_oGZedMyTFWkjy1rISGPzvsuDRWd97WStVJe6TcqKZlGgjbMtxFGCmYAVSS12EnSyUrjqQualUKDBqfAfsI_9dxDWajKvgD5NNm80NXqNsVmj8anyTPzh5-MmM2StZ8i_ED3krRVUAqXtegQZZgXGgxSoB4xx0IiAaF3BAIF9CrQ_xxcqESlDxF-NshZgBh8n0SwZlUDRMcDEKpYavBGNTnCYuArn60w3VRw1_fWoEB7nUH3sR3oME5ljvAGN5wwzz4KBlGJpDW9PrTqiEF4DU64_PQ0ZLcfrNTnagwUXhvUbiRcHt6ty-D3r3dYJ58xXxDN-iOGOb6ZHjMOKO8eMw2owTFzaWt6z_v__j5LxcypLJT5w7Js4PglWFwCdIvheZQk4qbsnOQ4iUPVozc_neVfTLytp0QAX2IozzNlwFNXCjCcAzS3tJkmaZZtlguA5pPE_M9s2y8MCRrXpK1ZOLdcHNMh887rFZ6PIl1ajA79WqcIX7vms3T44_Woml8_ygeuLuWTAgAlx2rQ1Vj0CPoBsOdhlZp6xF_hO2_DNvCN1mrPjgoFxvRgdHpi44L6-ObKwyYLndBcq_3jNIuW_zaB1kU0C9YIgT0i7AqJP2dEABvfMfNLhbgNOmyBUGvHwL6gexQV_338w9iusJYJi13nj9UQB4VU6rW13jOktAt2QuQVrDSpfMtqAaY7HFkhMZpmAU0PbDB6HCE9nUHkgirxiY5lA5tC7Lib4OO6Zr8D7c71VkUEtEabNk5GLNEADOWxD4WcknivEfzEDkdEg5WvCeXhGanW5waOheREzomxdyoLBODMOuqGgq3WqpbAw1CVt76ClcZFzsxHLrQid7E6hrTgIPxmJIGA86i_n9kvyBhQB86y4XxdQgG67B9J7dYExLe3mrYupI5PILpNP5r_wvN4leNVvGXURJV_AzGov-RhAbR-k-wmpeu-Jrj4-X5qdVWswtHYRo5O40Qgmc9_KRRvNTMYpWY0iwsBMgK9C99Oic0ycMoItMfBqnvz8jDtIDJ9iGLaLgty19mPD6hymLkdKQdzu-36_GYXKKdih5HIaJurKceTUe0i9U8bRy2aq_Ze097nI5pz2UOw5Am4RsTTJZF89JjNllBSrqYUpoWTikdWNOVCeVU3mQV8oZtedkfOZ4f5KXJhPagntFmczo_o53cNYrCFrRwXtbraLLiKOO5u072ecOYrFyg9AIv4lAACOYJj7ORbmI6b0-HzXo6Ix_ThmGYxGEF0Jq_QG57kCcrjnL2rpN9HuHJyoRyqtdoFUrWVD22w-TUAfL5u9IoVI0E27a9HY7mR-eZk9jDPNpvTudztIdz0_CNZov-0GFyInN6QbfDfjOZfEg7nJsuw7e__an98BQoQkhy4VxPYUbDy7Qn_hPnoSgaTzuMz2jpfJzx2_V4fH4unT93sQglK3on6MdntPN21W_X4_H5uee0YRjmy9C0UGomq181MsinK1NBkSAbPTaOYLIyF_XTsCnr3ur78ZlD5vmRC_eCTFZmM5FnZsbjM-vPVyM2XuDJyoxVuR9m1UvpVney33WWd4No7Zo7Qig6FP6QlSvDCKFpmGdxmqymBxlb4WiRhas0zrLF9GnD5aC_gR1NsYrrncctTknSqH8ex2F6uiGI1ktHT5NwmZ4_bNibs1NXD4wfnxZG47r-_l2yhpfksW-rj7X_87HC8sjM1O3GveAgOxBt3QlX0XZYlmDl6wo7LNMqbBddb_a642X_sqHcMbkF43vTLUjQvgtkFpuz4Y3V0OyPW1HXwHFDJEAFleuJ90x02LW6N6ZvFqQ9nsatE8-MefTV4reALhtSdlqDtOIdG45ekEFyPMpVuv4lFlklNYvjZb3M6rxclCtaQlSyPF6UCygYzVZlVtKa0b5EFoJ0rVCs8lJoIMywNjzVwXoo0pM1MdAydEDi3xt6UPaG3PZGen2FOzZxX8PuzaXq9lI1_B-oemfcVezmvFn71YrX9WdTcnReGA-Jmss6oOkn07Xjm_XMTzn9ToHkOflSZ8zqM_nJsTrj9LsVxTxkl-76jwXdLLtL9_04l89LRueUeX0l77Orr-dpEkVhdHq_K5uJS4w-0Q15tH1PRJYXWV1ujrI5pHyflK6SMJm54jV1tcfa18hkkZ5idX2ZfYnRbzS9B6iWHnDfAtM0C5ezBjpTlcaz9pR4RmeW8FGJ-klOsw25t6Az_3NHxOkipLPXu6ZB95w9pzGrT_SwXnW-GyVxlIfxrBtf09JeYvV7HW48i5pvXS8Y6jW9r2dLe_2eMfqtVsuxSj0nml6yiWsar3lOV7QW83Y6rJ644ccNzgesZtz5moaFDClsHOd_u4u5xOWa1ub09zGfa3qf-ZQ1z_Lj7ugDTp9qnD7md_7k8PujfzV9_P50fDs98wuitQjow9w7ADKp-7DaQ57HPiyhMz3SZPXQqR0btSyM8mSZeqf1I1dmHKTpW7aYhnm6TPLVhPkiCZM4TeIVId_Z91EPF82JMlmN5rq8w2I8athQMPxv2u_NHrAcmsAkTJMkXdFjD9iXCec0LpT6ZbzD4eLYHbpseUYybhjHUB1ajZvqLqnyJGc3cBcvk5zGeZ7Qm91dlmdRCdmqKnOapRAvkgWt6oyWSZ4mVZ3f8Dsa0TSicUSjZJXSsK5SWsawyGqI6uUqDxYRNIyLUIh9Eyq9vXEfqe5imqX54kawAoRxf11DqYTX4Ws6DdL7G33nvncV3dYEi0hwY82RjeVWuD_Lefz2tPm3_ybuvk0dPgz7D3o3nRZ3v_1lzYmBvtXLub-j_x8AAP__7oJdSA">