[llvm] [CodeGen][MachinePipeliner] Limit register pressure when scheduling (PR #74807)

Thu Jan 4 02:30:11 PST 2024

kasuga-fj wrote:

Thank you for your reply!. Here are the answers to your questions (the description is also updated).

> So, does it mean that, when MachinePipeliner is enabled on AArch64 pipeliner-register-pressure will be enabled by default?

Yes. We'd like to.

> What about pipeliner-ii-search-range and pipeliner-register-pressure-margin? Will the improved II search method reduce the need to use pipeliner-ii-search-range? And will pipeliner-register-pressure-margin remain available for fine tuning?

As you said, `pipeliner-register-pressure-margin` will remain available for fine tuning. I'm not sure if our II search method improvement reduces the need to use `pipeliner-ii-search-range`. But it may allow us to use the obvious upper limit of II (sum of the latencies of all instructions in the loop) rather than user specified one, without large compile time degradation.

> IIUIC, by following the discourse thread and checking the results at #65609 (comment), this patch reduces the number of cycles needed to execute the loop of https://github.com/AMReX-Codes/amrex/blob/9e35dc19489dc5d312e92781cb0471d282cf8370/Src/LinearSolvers/MLMG/AMReX_MLNodeLap_2D_K.H#L584, with some modifications. In this test code, with this patch applied, II would be changed from 11 to 20, to avoid spills/fills, which results in the number of cycles per iteration going down from 29.3 to 16.5, without MVE (#65609), and from 19.6 to 15.7 with MVE. Is that correct?

You are right. Sorry for going out of your way to find it.

> Also, it would be nice if the modifications made to the test code could be made public, so that others can try to reproduce the results.

My colleague @ytmukai are working to publish it. I believe that it will be published soon.

> Are there any improvements with the unmodified version too?

The improvement without modification has not been confirmed because the analysis required for the pipeliner doesn't work well and MachinePipeliner cannot be applied. We recognize that this is an issue and would like to resolve.

> Finally, it would help if you could try this patch with other benchmarks, like SPEC CPU 2017, if it's not too much work, to check how it impacts the performance of other workloads.

For the same reason as above, we've not been able to check the performance with other benchmarks. Please let this be a future work.

https://github.com/llvm/llvm-project/pull/74807