<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60826>60826</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Generate better code for std::bit_floor on x86
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:X86,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          kazutakahirata
      </td>
    </tr>
</table>

<pre>
    Compile:

```
// clang -std=c++20 -march=skylake -O2
#include <bit>
#include <cstdint>

uint32_t my_bit_floor(uint32_t X) {
  return std::bit_floor(X);
}
```

I get:

```
  %2 = icmp ne i32 %0, 0
  %3 = zext i1 %2 to i32
  %4 = tail call i32 @llvm.ctlz.i32(i32 %0, i1 false) #2, !range !5
  %5 = xor i32 %4, 31
  %6 = shl nuw i32 %3, %5

  89 f8                      mov    %edi,%eax      ; 25 bytes, critical path length = 6
  d1 e8                      shr    %eax
  f3 0f bd c0                lzcnt  %eax,%eax
  f6 d8                      neg %al
  85 ff                      test   %edi,%edi
  b9 01 00 00 00 mov    $0x1,%ecx
  c4 e2 79 f7 c1             shlx %eax,%ecx,%eax
  0f 44 c7                   cmove %edi,%eax
```

We could generate:

```
  31 c0 xor    %eax,%eax         ; 16 bytes, critical path length = 3
  f3 0f bd cf             lzcnt  %edi,%ecx
  0f 93 c0                setae %al
  f7 d1                   not    %ecx
  c4 e2 71 f7 c0 shlx   %ecx,%eax,%eax
```

Note that `lzcnt` sets the carry flag if and only if the input is 0, so `setae` gives us 0 for zero input and 1 for nonzero input.

We also use the fact that `shlx` masks the shift count with `0x1f`.

I've empirically verified the equivalence for all possible values of `uint32_t`.

```
input    0    1    2 0x80000000
--------------------------------
lzcnt   32   31   30 0
setae    0    1    1           1
not    -33  -32  -31 -1
&0x1f   31    0    1          31
shlx     0    1    2 0x80000000
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVkGT6igQ_jV46dIikMR48DAz7my9y-5x322KkCZhJcENxKf--i2IyXMcZ96jLIzw0d1f92cT4ZyuO8QtyZ5JtluIwTe23-7FZfBiLxrdCy8Wpa3O2xfbHrRBwp8I3RE6zTm9fsaf7JWwV5BGdDUsna8I30nCngl7ZhSWrehlQ_jO7c9G7BGWf7PpHNedNEOFQPhLqT3hfzzakc5XurvZjfOgO8_Zm4f2_FZq_6aMtT1hxbz-nbANkPXzCAfo0Q99BzG-J8Kfbg8FLOFXKFnvHvOM8zeo0X-dEADCMgaE70DL9gAdguYsLFLCXuAWxSPqgicPOhmPeRvQN5g0YrzQBqQwZrSVUmOO7Up6c1kFOCtuXegElDAOYwpY2H4BwpJedDWGh-zGfBbNn2w_BZkGNE9uIHmEuMZAN_yYYHw0mmW3mQAoNqAKeDhaewxfhGVYacJewoM4jXuEPwPLoDx7dMGw7LXXUhg4CN-Awa72TYwinxxVCeAnjlzTT47EacIrDlRBWYGk93hzkZ2f8VNg88Ecqk8cdViHU8LM7DNQ6jHWo_P37Cs9HSw3QBOg9PqZU5XSU3IFyzkimQIyWG9ArUEmd9TN6T0R-ZERVZCmINcPopStPeJ9ib74O_yDIO1gKqixw174X_QKAJ6EAgS5wYeMz1EENST5b6iBf6yu-qy0MyF5m4gNfyAIh17g-8qqdVDcAwlYPzH5UKAkFoiORZkxc1Z_I71_WY_gG-GB5DRyITkN4TnwDYIUfX8GZUQNWoHoKrCdOYfnsKu7w-BBO4g9wdlgIzILNmp9RAeDAwrK9nDB3l4PBDNJXOxs93N9dVd2YZyFwWF0pYT0c5yBbnDRCrcf43SNVj4IpfPwQ4fa5ZSeEkVy-s7sN8LWRwRsD7oP9TZnOGKvlcYq2sH_Bn0UBjuJMcDQDg_WOV0ahKMwAzqwKlifroF7D3dZHhkDQJRArC8DeiroOEbQ8hdjRF2VBpxBVDkAp1OvH_X0zs2tlq699iqlJedhYmFKYJlMd2IeMjbZvrE0jqlhX7X2NaM5C4tqy6sN34gFbpN8nec0z1i6aLaqKjCXWFRUqYoWNBUlLxLBsjwRac7zhd4yyjhlyZqmlNF0VQiWiUqJMqsUTzYbklJshTareFHZvl5o5wbc5rRg-cKIEo2LryCMlULusQv38vciJ4zFfwZrtXNYLe3B61ZfhNe2C3vZbtFvg81lOdQuXITaeffTi9fe4PbPazuCEr3HHqStRsl8vP_BdnAq8sXQm23j_cGF_fhOU2vfDOVK2paw1-Dg-rU89PZflJ6w18jJEfYaaf0fAAD___ifeKM">