<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/111773>111773</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
`load_bytes` inner loop isn't unrolled by the compiler
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dcci
</td>
</tr>
</table>
<pre>
Godbolt link:
https://godbolt.org/z/aE8Mq3j5G
The function (from libstdcxx) looks like this:
```
std::size_t load_bytes_loop(const char* p, int n) {
std::size_t result = 0;
--n;
do {
result = (result << 8) + static_cast<unsigned char>(p[n]);
} while (--n >= 0);
return result;
}
```
Manually unrolling the function to make it look like:
```
std::size_t load_bytes_switch(const char* p, int n) {
std::size_t result = 0;
switch(n & 7) {
case 7: result |= std::size_t(p[6]) << 48;
case 6: result |= std::size_t(p[5]) << 40;
case 5: result |= std::size_t(p[4]) << 32;
case 4: result |= std::size_t(p[3]) << 24;
case 3: result |= std::size_t(p[2]) << 16;
case 2: result |= std::size_t(p[1]) << 8;
case 1: result |= std::size_t(p[0]);
};
return result;
}
```
results in better codegen (and a win, reported in [1]).
It would be good if clang could do this sort of transformation automatically, as this code is used a lot in the wild.
[1] https://gcc.gnu.org/pipermail/libstdc++/2024-October/059634.html
Kudos to @ilvokhin and @ot for finding it and doing the analysis on real workloads.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0lU-P4jgTxj-NcymBHDv_OOTQ3QyjV69Ge9l7y7EN8WDsrF1ppufTr5zQdAexEtJqoyhgUvlVuZ7Ug4jRHJzWLSmfSbnNxIi9D62S0mSdV-_td686bxGscUfCnwjdEvrUIw4xrdiOsN1hDln7cCBs95uwnfjW_PiL_yy_z-Hz9c9ew350Eo13QFizD_4E1nQRlfz1i7ANWO-PEaw5asDexGu6y7Wil3NaRlQpgD9F81u_Ilgv1Gv3jjq-Wu8HwhrpXUSQvQiEPcFA2AsYh-BSKlI_zxgAgFtU0HG0CIRvgRL-JXC1cou18ktQOr48TFhzXb0Q_gLNlJo9Q0SBRr5KEZHwl9FNIqi5Vv6NsGYg5bMj5ZawzSIjqbdw7o3VCb5aOUjhU503gUHjGNylmusdUm_vNnO-_hBuFNa-w-iCt9a4A-BX0dDDSRw1GJykmpT6FOlheeLZoOz_E4Gu6PSGVVAvSVJEDTXhT1dA_ZIYN_hL-6u5_R_aFc0iUzomXvUor7zh0fu88lFeseRxdp9XPMrjSx4r7vP4ozy25OXVfR57lJcvef8gR_4ojt5OV5qNfztBc3AE46DTiDqA9Eof9GR4wikQcDYuvehBDz6gVin0c3PrGfM_hLMfrYJOw8F7BWYP0gp3ADn9rPxkkBB9QPB7wCBc3PtwEtOcihF9-irTMKdkIs7xqRgwEcaoUynWY0qfZvxsrFov3HauCW6sXsr1wY0Xqx_MoMNJGEvY7mLkhD1P545RVqz-kOg7HQjb0XJT8WLd48nC1zT_H5WPyVhIQY1988feOEiNIgX1CHsfYG-cSlZkcLqh_IcvCSfsezQRfFJJWDj7cEw-E9eZarna8I3IdJvXrCnzDeObrG9zVVXlJu86zTjLZSU1FY0SVEjBGqVYZtpUeU5zSmnOy82a1qUu2L7TklFeFwUpqE57Xlv7dkqNyEyMo27zPK9rnlnRaRunv1PGnD7DdJewNA1ZaNNDq248RFJQayLGTwwatLolFf20SlJRMM7pkPx2ABMdYTVe7Fkr6N6nRkh_GozVIRuDbW8EM9iP3Vr6U9LIvn18rIbgf2qJhO2mAiNhu8sO3lr2dwAAAP__aDNT1Q">