<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/60826>60826</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Generate better code for std::bit_floor on x86
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:X86,
missed-optimization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
kazutakahirata
</td>
</tr>
</table>
<pre>
Compile:
```
// clang -std=c++20 -march=skylake -O2
#include <bit>
#include <cstdint>
uint32_t my_bit_floor(uint32_t X) {
return std::bit_floor(X);
}
```
I get:
```
%2 = icmp ne i32 %0, 0
%3 = zext i1 %2 to i32
%4 = tail call i32 @llvm.ctlz.i32(i32 %0, i1 false) #2, !range !5
%5 = xor i32 %4, 31
%6 = shl nuw i32 %3, %5
89 f8 mov %edi,%eax ; 25 bytes, critical path length = 6
d1 e8 shr %eax
f3 0f bd c0 lzcnt %eax,%eax
f6 d8 neg %al
85 ff test %edi,%edi
b9 01 00 00 00 mov $0x1,%ecx
c4 e2 79 f7 c1 shlx %eax,%ecx,%eax
0f 44 c7 cmove %edi,%eax
```
We could generate:
```
31 c0 xor %eax,%eax ; 16 bytes, critical path length = 3
f3 0f bd cf lzcnt %edi,%ecx
0f 93 c0 setae %al
f7 d1 not %ecx
c4 e2 71 f7 c0 shlx %ecx,%eax,%eax
```
Note that `lzcnt` sets the carry flag if and only if the input is 0, so `setae` gives us 0 for zero input and 1 for nonzero input.
We also use the fact that `shlx` masks the shift count with `0x1f`.
I've empirically verified the equivalence for all possible values of `uint32_t`.
```
input 0 1 2 0x80000000
--------------------------------
lzcnt 32 31 30 0
setae 0 1 1 1
not -33 -32 -31 -1
&0x1f 31 0 1 31
shlx 0 1 2 0x80000000
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVkGT6igQ_jV46dIikMR48DAz7my9y-5x322KkCZhJcENxKf--i2IyXMcZ96jLIzw0d1f92cT4ZyuO8QtyZ5JtluIwTe23-7FZfBiLxrdCy8Wpa3O2xfbHrRBwp8I3RE6zTm9fsaf7JWwV5BGdDUsna8I30nCngl7ZhSWrehlQ_jO7c9G7BGWf7PpHNedNEOFQPhLqT3hfzzakc5XurvZjfOgO8_Zm4f2_FZq_6aMtT1hxbz-nbANkPXzCAfo0Q99BzG-J8Kfbg8FLOFXKFnvHvOM8zeo0X-dEADCMgaE70DL9gAdguYsLFLCXuAWxSPqgicPOhmPeRvQN5g0YrzQBqQwZrSVUmOO7Up6c1kFOCtuXegElDAOYwpY2H4BwpJedDWGh-zGfBbNn2w_BZkGNE9uIHmEuMZAN_yYYHw0mmW3mQAoNqAKeDhaewxfhGVYacJewoM4jXuEPwPLoDx7dMGw7LXXUhg4CN-Awa72TYwinxxVCeAnjlzTT47EacIrDlRBWYGk93hzkZ2f8VNg88Ecqk8cdViHU8LM7DNQ6jHWo_P37Cs9HSw3QBOg9PqZU5XSU3IFyzkimQIyWG9ArUEmd9TN6T0R-ZERVZCmINcPopStPeJ9ib74O_yDIO1gKqixw174X_QKAJ6EAgS5wYeMz1EENST5b6iBf6yu-qy0MyF5m4gNfyAIh17g-8qqdVDcAwlYPzH5UKAkFoiORZkxc1Z_I71_WY_gG-GB5DRyITkN4TnwDYIUfX8GZUQNWoHoKrCdOYfnsKu7w-BBO4g9wdlgIzILNmp9RAeDAwrK9nDB3l4PBDNJXOxs93N9dVd2YZyFwWF0pYT0c5yBbnDRCrcf43SNVj4IpfPwQ4fa5ZSeEkVy-s7sN8LWRwRsD7oP9TZnOGKvlcYq2sH_Bn0UBjuJMcDQDg_WOV0ahKMwAzqwKlifroF7D3dZHhkDQJRArC8DeiroOEbQ8hdjRF2VBpxBVDkAp1OvH_X0zs2tlq699iqlJedhYmFKYJlMd2IeMjbZvrE0jqlhX7X2NaM5C4tqy6sN34gFbpN8nec0z1i6aLaqKjCXWFRUqYoWNBUlLxLBsjwRac7zhd4yyjhlyZqmlNF0VQiWiUqJMqsUTzYbklJshTareFHZvl5o5wbc5rRg-cKIEo2LryCMlULusQv38vciJ4zFfwZrtXNYLe3B61ZfhNe2C3vZbtFvg81lOdQuXITaeffTi9fe4PbPazuCEr3HHqStRsl8vP_BdnAq8sXQm23j_cGF_fhOU2vfDOVK2paw1-Dg-rU89PZflJ6w18jJEfYaaf0fAAD___ifeKM">