<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/131130>131130</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[LoopInterchange] Cost model more dedicated to vectorization
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
kasuga-fj
</td>
</tr>
</table>
<pre>
LoopInterchange is effective to get a vectorization opportunity in some cases. However, the current implementation of LoopInterchange doesn't consider about vectorization very much. There are several issued of the LoopInterchange cost-model that need to be addressed to increase the vectorization opportunities.
First, the cost-model of LoopInterchange is consists of several individual decision rules. They are applied one at a time, with the one applied earlier having higher priority. In the current implementation, the rule based on `CacheCostAnalysis` has the highest priority, and the rule for vectorization has the lowest priority. However, there are cases where it is profitable to exchange the loops for vectorization even if it is detrimental to the cache. For example, exchanging the inner two loops in the following example looks about x3 faster in my local (compiled with `-O3 -mcpu=neoverse-v2 -mllvm -cache-line-size=64`).
```c
__attribute__((aligned(64))) float aa[256][256],bb[256][256],cc[256][256],dd[256][256],ee[256][256],ff[256][256],gg[256][256];
// Alternative version of TSVC s231 with more array accesses than the original.
void s231_alternative() {
for (int nl = 0; nl < 100*(100000/256); nl++) {
for (int i = 0; i < 256; ++i) {
for (int j = 1; j < 256; j++) {
aa[j][i] = aa[j - 1][i] + bb[j][i] + cc[i][j] + dd[i][j] + ff[i][j] + gg[i][j];
}
}
}
}
```
Next, the rule for vectorization in the cost-model would have a bug. For example, in the following case `isProfitableForVectorization` returns false even though exchanging them is s necessary in order to vectorize the innermost loop.
```c
__attribute__((aligned(64))) float aa[256][256],bb[256][256],cc[256][256],dd[256][256],ee[256][256];
void f() {
for (int i = 0; i < 256; ++i) {
for (int j = 1; j < 256; j++) {
cc[i][j] *= dd[i][j] + ee[i][j];
aa[j][i] = aa[j-1][i] + bb[j][i];
}
}
}
```
See also https://godbolt.org/z/f8TW9dG89, the debug output `Cost = 0` means that the profitable decision is delegated to `isProfitableForVectorization`.
Based on the above, I suggest the following to make LoopInterchange possible to make a profitable decision dedicated to vectorization.
- Add a new option to give higher priority to the vectorization profitable decision in the cost model.
- Fix the vectorization profit decision bug.
Any comments are welcome, thanks.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzUV19v4joW_zTm5SgoOBDggQdoxe5Iq92Vppr7WDnxSWLq2JHtQJlPf3WcUFpKZ3Tv262ikpz4_P-dPxHeq9ogbthixxaPE9GHxrrNi_B9LZLqMCmsPG_-Y233zQR0ZSNMjaA8YFVhGdQRIVioMYCAI5bBOvVTBGUN2K6zLvRGhTMoA962CKXw6Kfwb3vCIzrGHyA0CGXvHJoAqu00tmjCKKCCW73SojeMLwOU1ngl0YEobB9uVB_RnaHty2YKTw06BOEQPKkUGpT3PUqSTrpvNZTWh6S1EjWERgQwiJI8LBCElA69H56VKR0Kj1HIV44r9FOWblm63Svnw5u_Vx13fFR-cM4HT6_fzDZSHZXshQaJpfKkyfWawvnU4Dm6KLpOK_LNIAjKSFAtktaTCk1UHd-Mp1A4rdBBI47K1NCoukEHnVPWqXCewjfzi-xcfCEToBA-agWWpw-ibPDB-rA1Qp-98ixPoRE-no46fHhTQlKEkVdJlXU30bywant6z3kLojHJEWBwis8qUCw7ZysVRKEjUPF1jPIg0nb-jko8ogFVjQIkBqei35okxJCQj1PYWwf4KiguZMUom2JJh5Qx6CCc7KhHDeGsrNb2RIdGVnr94kccv2ZQCR_Q0fH2DNqWQgPjq9K2ndIoh1SyPE3-l0HSll3PskeD9ojOY3LkkLRaH1tIoo2JVgYTr34iyx7zOctTxtcjJOkhXiVLt8_PIgSnij7g8zPjK8ZXQlNjkIyv8jnj6-GCSltClmCLHV_kbPF4veEPRXGXXJZ3yVLeJSPeJVfVXXJd3yFnu9FFvmd8D1sd0BkRmxXFaWwuT99_PIDn2WyIaWsjhpw4gyhLKnQCnhiyZp2qlRGaYne0Ska-Z3EVHGO2BrYk1RBBxfhKmQBGA8seIWXZbrh_gFmaMr5lfDVL6Y_xPRnO18MRxnfxukr7IE9dxakojXizHQxs6iPfB85D5JzR4cN7zsM9jfQX03wYQqvY4jHyD0RIYPb-Bd9BzP7hhhhzrwbi4UKMmb8lxgTfEmN63xGH1A7WseXjeD_eDT_j_wu6ByT8F1_Dh5b1uerH-nzXnE-215LaI4KAoq8_Vfynkqb2Q8Wp_P_f2s7euh_vFVE_dBh6ZzxUQnsc-k1obF83N02kpQbkwSDhUbg4SK2jsRfsm_l4bTet9SG2m39ijV-qNtZX9XVB_aUC-FvwvwPaLfHexW305QuI_qKAkt-Vz1XOBei_wPd3RBDaW2hC6DzLxtZXW1lYHabW1YzvfzK-r1ZPf6zlv1brSzVILPoabB-6PsT5TRAaApyn0KIwftiE6PC7Yfq2hMQRqbEWYdiMfg__EZy7y9ZAkkVhj7GmvoHv65pG_cfSChZa8fJ5Yeus92qc7vGAuGulRKnKi4kfCn-0JoGtlCDA4AlsFzsCLbY0M242o8sW8LF93A3NtaVAbCnTqGivXr8UcGWmhjOYtjVnKG1LO4iPW84JdWmHzY4m1IufTuQmk-tsLSa4mS3ns1W-nufzSbNZzFNMl2WFs2y-XOWCLyTmmC_yRbZMOS4nasNTvkizWTbj2XI-n67TJXJcZvOVwGrOczZPsRVKT2mvICRN4ga9mWWzWZZOtChQ-_j5wDkFL75lnNPXhNsQU1L0tWfzVNNOexUTVNDxu-MmpVQQD28hG-by1-mb9E5vblCvQtMX09K2jO9J3fiTdM4esAyM76ORnvH96MVxw_8MAAD__zoPELo">