<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/91780>91780</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[SLPVectorizer] Missed optimization: missed vectorization with non-clobbering store at the end
</td>
</tr>
<tr>
<th>Labels</th>
<td>
llvm:SLPVectorizer,
missed-optimization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
XChy
</td>
</tr>
</table>
<pre>
Alive2 proof: it's too slow to verify the vector optimization, so I take GCC as the oracle in the link below
Godbolt link: https://godbolt.org/z/xe6Tzq8Ge
### Motivating example
For the reduced testcase below:
```llvm
void e1000x_core_prepare_arr(uint16_t *arr, uint16_t value)
{
uint16_t checksum = 0;
int i;
arr[11] = value; // vectorized if we remove this line.
for (i = 0; i < 0x3f; i++) {
checksum += arr[i];
}
arr[0x3f] = checksum; // vectorized if we replace it with "return checksum".
}
```
We get a long `add i16 checksum, arr[i]` chain: https://godbolt.org/z/xe6Tzq8Ge and SLPVectorizer fails to vectorize it.
After some trials, I found that if we remove any store outside loop, vectorization happens but codegen is still poor compared with GCC: https://godbolt.org/z/rEfbzhW77
Looks like a phase-ordering problem for cases, since the store outside loop may be merged into stores after unrolling. But even for the case with only a pure loop, the count of vectorized instructions looks bad: https://godbolt.org/z/oznjWWcKr
### Real-world motivation
This snippet of IR is derived from [qemu/hw/net/e1000x_common.c@e1000x_core_prepare_eeprom](https://github.com/qemu/qemu/blob/dafec285bdbfe415ac6823abdc510e0b92c3f094/hw/net/e1000x_common.c#L195) (after O3 pipeline).
**Let me know if you can confirm that it's an optimization opportunity, thanks.**
cc @alexey-bataev
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVt1u47gOfhrlhmhgy3F-LnKRn5NicDo4B7OD7d4VskTHmsiiR5KTpk-_kJw2yexgMFsItc1SJD-SH1nhvd5bxCUr16zcjkQfGnLLvzbNeVSROi9XRh-RQ-eIalasQAfGZx4CEXhDJwgER3S6PkNoEI4oAzmgLuhWv4mgyTK-AU_wCYI4IDxuNiB80iUnpEHQNn0ZbQ9QoaETy7YsWz2SqsiEJI9-mxA6z4oV4zvGd_vhr2Nye8Z3b4zvXnH69e37_BGH65ffvBgOfKagjyJouwd8FW1nEG4Vd-RSFA5VL1FBQB-k8HiJqHi3N82GY8yxHURH0gowz7Ls9UWSw5fOYSccvgjnGJ_32oZ8-hKA8VWSbOBDdBSmR8YXF9uz9fACAFcd2aA8-L4FVmwhY8WNjrYB9IfkKo9uynWes3Kbbg1uijXAkLtLkfQbKtA1nCLqlo4IodE-5hvHdyZrcsD4XH-EAPF1A9lrUacvxtfpLOAOQ_y5hs_X8foQm2bl9gqFzbY_h5AcXEC8G0o4fgGkM0Ii6AAnHRpgnDsMvbPX-5y_o_tw-17U2yieEfYYQIAhuwc2zYRSoPPpjaHNLZppBrIR2v6rVgUQVsEfT___8x2Ig1po4wdWXWSgw_jSrKs6oANPLUJwWhgfo_gENfVWQWhEuC-osGfwgRwC9cFrhWCIunjn3XhiKDSi69B6qPoAkhTu0YL24IM2BjoiB5La2NRqSOvjZvM7ON1_6uqteZ7NhuCfiA6xvw4IArpGeHwgp9BFTnaOKoNt6rXIuwTMaysx0fKfIKAVZ6gQWnT7WH4baNDyIFKSeuvIGG33Y1j3AfCINlmP5hKzExKy5hyj6d01N0mDehuA6rsWsz64XsaM-ah88FAJ9TuJoDf77flZ_tf9fDZ9QWEeTuSMgvYypsjeqn6NxPRWdx2moD59ieWJqTuigtpRC6xcf8e2Z3zXnBjfWQyM7z7GUtuSHUs2yX42qBA7R21sYj7_AYkOTV-NJbWM7y7mL4_KUMX4TokaJZ-XlapqnOSlkNM5L0SlZJlnmFULLos6W0x-HRcvnvJFmQYInw_l-18Bne4wTiPGF-P7xMXzhAFahIOlU2z6M_UghQVJttauvZBh2FXC3i0koK4jF3qrw3mot7AHPx7M3jqSEtgkEwZf8fxQiSDwCCO1LNSiWIgRLvNZXhbTYpIXo2Y5q9W0mpYZTmZ5Vs2xLKVQOBc45TNR5dORXvKMT7Iyz_JZmeXFuJrwTOJknksuRcXrWJ5WaDOOyyV2z0h73-Nykc_m2ciICo1Pa5rztH6K1d3kYJwzvmGct9p7VA_3O5jH5e6W8eJD1e89m2RG--CvzoIOJv0bcG-13MLnZPB-qRcrGPz8MEoSqyzZB2moqgZyD-wVITELrRr1zix_0WkJ3fB46Bx9QxmbJiXDM75L-fg7AAD__28QxE4">