<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/150954>150954</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
x86 missed optimization for __builtin_clz()^31
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Alcaro
</td>
</tr>
</table>
<pre>
```c
#include <stdint.h>
uint8_t b(uint32_t in)
{
uint8_t ret = __builtin_clz(in) ^ 31;
return ret;
}
uint8_t c(uint32_t in)
{
uint8_t ret = __builtin_clz(in) ^ 31;
return ret + 1;
}
```
Expected: Since b optimizes to a single `bsr eax,edi`, c should optimize to at most one instruction more.
Actual:
```
b(unsigned int):
bsr eax, edi
ret
c(unsigned int):
bsr ecx, edi
xor ecx, 31
mov al, 32
sub al, cl
ret
```
GCC gives good output (probably by being less clever about normalization), as does Clang if I add an extra optimization barrier. https://godbolt.org/z/3ansdnMxa
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0U82O8zYMfBr6QnyBLNlr5-CDN5sUPfTUBwj0wzgqZCmQ5DSbpy_kzX7Nbi-9fIQBGSRHMxqCMiU7eaIB2ldo3yq55HOIw-i0jKFSwbwP8MI-Pg1sBC6s124xhCB2KRvr8-YMYg9sXKzP_TGjAt6Xf8GPGa0Hvi3A7hXYiIj42RYpI4g3PB7VYl22_qjdHXi_IhDaPYoaxCcqUl6iL8dHDrq3J0r9aykR-CvWz8Q_TQE27m8X0pkMiBH_tF4TKgyXbGd7p4Q5oMRk_eQI4YWpFJHkDfiOjC14vkON6RwWZ36iVlDGOaSMwRNan3JcdLbB4xwibYCNo86LdCDGb2pW9_06VYPW5-LF2oSPKApKfKjAIoONq7GstOn_jdfP-EfcwpeiqEttDtc1Kd2a40_9aVFPJe2eSg9FT0_7bbfDyV4p4RSCwbDky1Jm019iUFK5d1TvqMj6CR2lhNrRlSJKFZaMPsRZOnuXxcXyKL5DmdAESrhz0k9oT_g7SmNQeqRbjvJzHisElYzRUtzgOedLKp7wA_DDFIwKLm9CnIAf7sAPQvpk_B83WZlBmK3YyoqGumtFx7bbRlTn4SR71nac1502dd_wRulayk53p4aI-lNlB854yzre10JsGdtw0bCuPxEzvallo6FhNEvrNs5d58Jd2ZQWGuqWbdumclKRS-tSc-7pb1yrwHnZ8TgU0A-1TAka5mzK6d9rss2Ohlv_grNNicxXD04h_md5ipftXtTVEt3wzRubz4va6DADPxSKx_HjEsNfpDPwwyosAT88lF8H_k8AAAD__0qZYXw">