<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/143940>143940</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Suboptimal register allocation when combining stores
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
neildhar
</td>
</tr>
</table>
<pre>
I have the following function:
```
void foo(unsigned long long *base, unsigned long long *lim){
while (base < lim){
*(base++) = 0;
*(base++) = 0;
*(base++) = 0;
*(base++) = 0;
// Prevent emitting a call to memset
asm volatile("");
}
```
Clang generates the following for the loop:
```
.LBB0_2:
add x8, x0, #32
stp q0, q0, [x0]
cmp x8, x1
mov x0, x8
b.lo .LBB0_2
```
However, the add and move seem to be unnecessary. The add can be avoided with a write-back in the store, so the loop can be more efficiently written as:
```
.LBB0_2:
stp q0, q0, [x0], #32
cmp x0, x1
b.lo .LBB0_2
```
https://godbolt.org/z/acqMscs9K
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzMVEGP6yYQ_jX4MtqIgB3HBx-SjaJWbaVK7b0a44lNiyEPsJPtr68g2fei7GrPtUYg-ZtvZvhmAEPQgyVqWbVn1aHAOY7Ot5a06Uf0Ref6t_ZnGHEhiCPByRnjLtoOcJqtitpZJneMZ9vwu_Hd4nQPJ-eY2M42Z-jBODvcFiZ2HQZi4hU-R42emGhYvWd8BwBwGbUhYGKbaMDkKzx7pI-J3d2FiX22Bpg8AGfyf-B2ZOIIv3tayEagSceYVERQaAxEBxNNgWKiYJhgcQajNsTElgmRrfkRkdWHJ8EZ370atAMMZMljpPDcLufzH-Pc-d6yB_rq1_2e_yVuwHvN2Pd5v25Tp648rUxIKR58Qjzn_VtGbyur9lfOqsODm5rOj6HWD9DklhuUudftA9StjIP30j6c9yd3oYV8YqWDpWrR9ikeQSCakqgdwWwtKQoB_dsK_rw7KrQJwzSn1MNFxxEQLl5HeulQ_QPa5qAhOp_nNLjv6r2TJ-cJ6HTSSpON5i3TI1nA8Omd-FTjL_V71Pu7gvyDgl_LNMZ4zgXlCRxc3zkTV84PTBz_ZeKI6ttvQYXml6JvZd_IBgtq13XZ1GVTCl6MrTxRpSTHTV9VRJXcKNlzWZY9iabp6rrQreCi4pu1WNeSr7erTVXWJ4mIqmywrxtWcppQm5Uxy5RyFzqEmdp1KZuSFwY7MiE_QUJYukBG09BXh8K3ifTSzUNgJTc6xPAjTNTRUPvH3Llz1BMa8DToEMkDGuMUpgcKLiNZUG7qtE03Ifc0FLM37ZM2Oo5zt1JuYuKYUty3l7N3f5OKTBxzYYGJ473ypRX_BQAA__83CH0e">