[llvm] [ModuloSchedule] Implement modulo variable expansion for pipelining (PR #65609)

Thu Sep 7 07:59:46 PDT 2023

ytmukai wrote:

I also created an implementation for AArch64 at https://github.com/ytmukai/llvm-project/tree/swpl-aarch64. I will create a PR for it after this PR is merged, as it depends on this PR.

I measured the effect of modulo variable expansion (MVE) with llvm-test-suite. I used Ampere Altra Max, an AArch64 processor with Neoverse N1 core.
The performance compared to the non-pipelined results are as follows:

| MVE | >3% improvement cases | >3% degradation cases |
| --- | --- | --- |
| Disabled (conventional) | 50 | 222 |
| Enabled | 71 | 115 |

The total number of test cases is 811, selected by enabling TEST_SUITE_BENCHMARKING_ONLY.

The number of cases in which the performance improvement is more than 3% has increased by 40%.
There are still many cases showing degradation. I think this could be improved by adding a process to limit register pressure.

The optimization flags used are as follows: 
* non-pipelined: -O3
* MVE disabled: non-pipelined flags +  -mllvm -aarch64-enable-pipeliner=1 -mllvm -pipeliner-max-stages=1000 -mllvm --pipeliner-max-mii=1000 -mllvm -enable-misched=0 -mllvm -enable-post-misched=0 -mtune=neoverse-n1 -mllvm -pipeliner-enable-copytophi=0
* MVE enabled: MVE disabled flags + -mllvm -pipeliner-mve-cg=1

https://github.com/llvm/llvm-project/pull/65609