<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/58467>58467</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] Fold the mul and add into mla
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
vfdff
</td>
</tr>
</table>
<pre>
test case:https://gcc.godbolt.org/z/Yaa1sj9qc
```
void f(int * __restrict__ a, int * __restrict__ b, int * __restrict__ r) {
for (int m = 0; m < 64; m++) {
int c = 0;
// #pragma unroll
for (int i = 0; i < 32; i++) {
c += a[i] * b[m * 32 + i];
}
r[m] = c;
}
}
```
This case first reported in https://bugs.llvm.org/show_bug.cgi?id=35448, and now both gcc and llvm can generate the SLP for the loops, but the clang's version still have less mla in the kernel body.
**gcc:: 7 mla** VS **clang: 4 mla**
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx1U0uP2yAQ_jX4Mqplg185-JBslFMPlbaq1FOEAdtsiUkBZ7X99R1ItNlUXWkshnl9M_Phwcq3PigfQHCvCNvOIZw9noQeUCYh8snKwZqQWzeh5Q9-Pzkv_cvmtyDFnhRb0hQ3SdeL1RJGQju9BCB0C8ejQwCnRTgegRP6BP_1DJ96HKEbIO3uWh9gtA5u9U9A2B4KwnZJfYKmSjqhuyQPeZDKi_eUu_06LR7s7Ph04rAuzhpzD_gAqe-QOkEymvRPICECoh1zOKl3mtT7NOGAl1PSGI0BED2PTbX7-8XF8JSLhcSHuPeou_LIx_dZ-8QujNoh0U6drQtK4jLgke1hnXxuzOV049rP9vWIxlxMmrCDlojN6qrqIlN8kbDYVxhsmAHfSTLEZMRaYFKLcjwoCLOC56_f0gKjbqxFRMwf1pAMwvAFwVoPF-W8tgv4oI2BmV8wWnkPJ8NjrzH4l3KLMogp3_LbsDQK4scp2BbaGH41wo9nuGpXDPRWd2-m-rJpmrKjdcEy2TO5YRueBR2M6nHZ260TM74mXPnBGpngT6tJY3IZtxdsrJatzvT__DU6zOuQC4vP8BBXcju-nJ19USLgVXu_KtzDoe6qps3mXnWFqrtiaCSlqqjqjrVcyIKVAxtb2g2Z4YMyPnZGKF3UK6QSqGOHme5pQWlZlJuS0o7SvB43hRStYKMaaFe3pCrUiWvzTm_m-tRS5BydRvtw5z7j3utpUWkRsT5fw2xdfxnlOGYJuE-N_wVg-zl3">