<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/61183>61183</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Generate better code for std::bit_floor from libstdc++
</td>
</tr>
<tr>
<th>Labels</th>
<td>
llvm:instcombine,
missed-optimization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
kazutakahirata
</td>
</tr>
</table>
<pre>
We should improve the LLVM IR for `std::bit_floor` from libstdc++. Specifically, when we compile `std::bit_floor` from libstdc++, we should generate LLVM IR that is as good as that we would generate for `std::bit_floor` from our own libc++.
```
#include <bit>
unsigned my_bit_floor(unsigned x) {
return std::bit_floor(x);
}
```
libstdc++
```
$ clang -march=skylake -std=c++20 -O2 -S -emit-llvm bit_floor.cc
```
```
%cmp.i.i = icmp eq i32 %X, 0
%shr.i.i = lshr i32 %X, 1
%0 = tail call i32 @llvm.ctlz.i32(i32 %shr.i.i, i1 false), !range !5
%sub.i.i = sub nuw nsw i32 32, %0
%shl.i.i = shl nuw i32 1, %sub.i.i
%retval.0.i.i = select i1 %cmp.i.i, i32 0, i32 %shl.i.i
ret i32 %retval.0.i.i
```
libcxx
```
$ clang -march=skylake -std=c++20 -stdlib=libc++ -nostdinc++ \
-I/usr/lib/llvm-14/include/c++/v1 -O2 -S -emit-llvm bit_floor.cc
```
```
%cmp.i = icmp eq i32 %X, 0
%0 = tail call i32 @llvm.ctlz.i32(i32 %X, i1 false), !range !5
%shl.i = lshr i32 -2147483648, %0
%cond.i = select i1 %cmp.i, i32 0, i32 %shl.i
ret i32 %cond.i
```
Here is the value after each LLVM IR instruction:
```
input 0 1 2 0x40000000 0x80000000
--------------------------------------------
libstdc++
shr 0 0 1 0x20000000 0x40000000
ctlz 32 32 31 2 1
sub 0 0 1 30 31
shl 1 1 2 0x40000000 0x80000000
sel 0 1 2 0x40000000 0x80000000
libc++
ctlz 32 31 30 1 0
shr undef 1 2 0x40000000 0x80000000
sel 0 1 2 0x40000000 0x80000000
```
FWIW, here is the x86 assembly:
libstdc++
```
89 f8 mov %edi,%eax ; 25 bytes, critical path length: 6
b9 01 00 00 00 mov $0x1,%ecx
d1 e8 shr %eax
f3 0f bd c0 lzcnt %eax,%eax
f6 d8 neg %al
85 ff test %edi,%edi
c4 e2 79 f7 c1 shlx %eax,%ecx,%eax
0f 44 c7 cmove %edi,%eax
```
libcxx
```
f3 0f bd c7 lzcnt %edi,%eax ; 17 bytes, critical path length: 3
b9 00 00 00 80 mov $0x80000000,%ecx
c4 e2 7b f7 c1 shrx %eax,%ecx,%eax
0f 42 c7 cmovb %edi,%eax
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0V91u4zYTfRr6ZmCDImX9XPgiiT9_XWCLAl2g2bsFRY0sNpTkklTs5OkLypItOXayaVHCcWRpfs4cHpIjYa3a1ogrsrwny_VMtK5szOpJvLZOPIlSGeHELGvyl9Ujgi2bVuegqp1pnhFcifD16x-_wpffoWgMkIhalxN-R_hdptyPQjeNIRGFwjQVaJVZl0vC7gm7X8C3HUpVKCm0fiHsAfYl1rBHkE21Uxo_E6xzP6HbYo1GuDM0VwoHyoKwsG2a3P_vbu0R9lOPj4toWgPNvvb5h0oIXRN6139HtP8cfzKuaqnbHIHwh0w5wv83Nm_rjv0cqpcf51wsOd0_EJYCie-P5gAGXWtquAKQJd6W8N6UxOvriLrvKXvXkYcgtai3MK-EkSXha_v0osUTwrzLvu69GYX5bwzm32COlXJzrZ8rOMFaSPkOjIubAIQtZbVbqIUCwtegZLUD_AsUZ_7Rdz_Pgy1hS1uak6m2pZnYBaOYtDNxQmnwcjvahdRDXUinXxeKM8KS3r0P64OoAAqhLXpm2QMQFhhRb9FfLEfxbZudgNg2g7rdQ233XR4f-aEDMQauz_al7uy9bdCb9vFGGQy6Z6EX9OyGGqXzAM-cdYg5AzpcnFON9DM8GYd8XyrycPj3GrEu1yojfH1eOTCvG-tyVQ-_yfKhBzr_QtimtYawjfdiGz9X8yAkbNMvKMI2p9W_eQ7gP1Dhhxr8vLa-f0ZVfuqm4p6zIIzDhEdhcqGqDnRT5zflcVMcV6RxDPQOZb-gQb-j-gPgWegWQRQODaCQ5WnbVbV1ppVONbXfrG4zrupd66AbFAACAGAA9BDS4wB6SPrLo8P8E-Pmhuc5peeU9MCGdKPUR1M_j0d8nPV_AZwHGzYbv_bhXMhQTD84HV0PHqUebgVXCn9TuUU9zvAxVedFPK79oqBgCg6CQeGeIwBo6xyLq-l-DuB7Dlf1tXn88uh1Wo50dkgiENZilemXCz19fJwNmJIUigTejqp59sLH3K8TfyEOwyPC74EtIXtxaD0maZTzXQvshCtBY711JeF3EE0zZSnQAPyMdJ9JJh-VhfQQ9MnkYeqbB4DXUPrpOIKb2hccaAFZDpJO7PWrrB0MLkNhF74R5AnUuD2iWgp9wdgSiuIKFofWTSnL1dRThoAM4hSKGGQwrkIf4AKVfAOPFhCGIOPLvLLyneflZP2zA2xEXHyLuBuKCOKPFcHfKmKQQ3JDEae1cSGMnsvsLZfmJ7jsh6eU-VI9h9lHHM7yFc9TnooZroIojtOIBlE6K1dJLjkmUUZ5QdMEachQiowWmOd8GcfxTK0YZZxyGjJKWZgu0iBlSR5HMgp4imFCQoqVUHrRnZKN2c6UtS2uoiBI-EyLDLXtXkgY8xaE3_mzRDZVpmokjHWQWaWsxXze7Jyq1KvozhnG_EuMWXXtQtZurT-KlXX2nMopp3H1_6Hlz9D5k0s2-bH9f9taX3nhmLVGr0rndtbbsg1hm61yZZstZFP13crQtOxM8ydK5zsXX6QlbNPV-XcAAAD___O3hqk">