<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/102946>102946</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Inverted shift optimizations should incorporate global offsets
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Validark
</td>
</tr>
</table>
<pre>
Let's say I have this function:
```zig
export fn foo(m: u64) u64 {
const x = ~m & (m << 1);
return (@as(u64, 1) << @intCast(@popCount(x) + 1)) - 1;
}
```
LLVM handily optimizes it to the equivalent of this:
```zig
export fn foo(m: u64) u64 {
const x = ~m & (m << 1);
return ~(~@as(u64, 0) << @intCast(@popCount(x) + 1));
}
```
However, we can do slightly better by moving the `+1` into `~@as(u64, 0)` by pre-shifting it by 1. Since `~@as(u64, 0) << 1` is `~@as(u64, 1)`, we get:
```zig
export fn bar(m: u64) u64 {
const x = ~m & (m << 1);
return ~(~@as(u64, 1) << @intCast(@popCount(x)));
}
```
Here is the assembly version:
```asm
foo:
lea rax, [rdi + rdi]
mov rcx, -1
andn rax, rdi, rax
popcnt rax, rax
inc al ; we can remove this increment by changing `rcx` to -2
shlx rax, rcx, rax
not rax
ret
bar:
lea rax, [rdi + rdi]
mov rcx, -2
andn rax, rdi, rax
popcnt rax, rax
shlx rax, rcx, rax
not rax
ret
```
This optimization should be applicable across architectures. ([Godbolt link](https://zig.godbolt.org/z/qer96vaqs))
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMlc9v4ygUx_8acnlKhHHiOAcfJq2yO1L3tKu5Y_xss4PBhWc36aF_-wo7mSZVd7Qd9bBWFAR83w8-PECGoBuLWLDNnm3uF3Kg1vnimzS6kv77onTVqXhAYmIbIMgTfIVWjgjU6gD1YBVpZ1n6hfF7xi__GZ9_z7qZR_DYO09QW6idYyLvWPoFhmzNxC42wLb7WQgAoJwNBEdg6T28dMBEBtEEWHrH0jtImNix9ErvkQZvo4atuQxM5JPnWXmxYmuuLd3JQLOud_2dG2zsHSeZ2M-exQ6WkPwIwLb3bxZ1vdKHh29_QCttpc0JXE-6088YQBOQA2oR8HHQozRoCVw9QftfsHphIn95Q4v_Gq3_SOp394Qj-hjoCUFJC5WDYHTTkjlBiUTooTxB50Ztm4ld9CD2Ccs4aEsu9t_LOc6XJ-g9LkOra4rmmuJQsoI_tVX4b5avlGKI8J4smQOc026QPrB9pfQ_3b5P3LsPVPrHdg09RjJxO2QI2JXmBCP68JNDL0M3j8TyvWjg_BmUU-vlMebNNntf6amcfKXZ5pxLrIJZpibZMrn1Im1lr7xE09jI462sd72y9Cp7O6-tmt0ZYOn-UpceO3e54LRVHrt4essTqFbaJlYXy3hMLOPxlC_FrdPQmuN1bur92NbRDxDzhEe65hmr5xfpnb-3EMXnQvyEhb5Xcn9F7uerVMbHBULrBlNBiSD73mglS4MglXchgPSq1YSKBo9hNb0Bm_1vriqdITDafo9URN4S9dPNKw5MHJ51s2pmzcr5Jo4wcXhEv8tG-RjmE7KoirTapTu5wCLZinSdZ9k2X7RFwrFKVb1J82RT5mqLO5nWvK5QZBUmqVroQnCx5nkikjzZbvKVUvWurqu6ytcbzjFja46d1GZlzNjFDBY6hAGLhIvdOlsYWaIJ04MshMUnmGaZEPF99kU0WpZDE9iaGx0ovLohTQaLr3ZET1jBdB3esAwXmNoq53vnJSE0xpXSgKvrgBQWgzfFLa9GUzuUK-U6Jg4x2LlZ9t79jYqYOEwpBiYO5zWMhfgnAAD__0FsUTs">