<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/128441>128441</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Suboptimal codegen involving tzcnt
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
tavianator
</td>
</tr>
</table>
<pre>
Here is a simple loop to find the index of the first different bit in two arrays:
```c
unsigned long crit_bit(unsigned long *a, unsigned long *b, unsigned long size) {
unsigned long ret = 0;
#pragma nounroll
for (unsigned long i = 0; i < size; ++i) {
unsigned long diff = a[i] ^ b[i];
int bits = 8 * sizeof(unsigned long);
ret += __builtin_ctzg(diff, bits);
if (diff) {
break;
}
}
return ret;
}
```
On x86-64 with `-O3 -mbmi`, the loop generates
```asm
.LBB0_4:
mov r9, rax
mov r10, qword ptr [rdi + 8*r8]
mov r11, qword ptr [rsi + 8*r8]
mov rax, r11
xor rax, r10
tzcnt rax, rax
mov rbx, r11
xor rbx, r10
cmove rax, rcx
add rax, r9
xor r11, r10
jne .LBB0_6
lea r9, [r8 + 1]
cmp rdx, r8
mov r8, r9
jne .LBB0_4
```
First, it's using `cmove` to get 64 in case the input was zero, but `tzcnt` already returns 64 in that case so that's redundant.
Second, it could use the flags after `tzcnt` to handle the `if (diff) break`, but instead it re-does the comparison.
Better code would look something like this:
```asm
xor rax, r10
tzcnt rax, rax
lea rax, [rax + r9]
jc .LBB0_6
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyMVdGOqzgM_Zr0xWoFAVr6wEOZ2WofVroP9wNGITE0MyFhk9B25utXCWxvmXavtqoEiX3s4xPHMOdkpxErUtSkeF2x0Z-MrTw7S6aZN3bVGPFZ_YkWQTpg4GQ_KARlzADeQCu1AH9CkFrgFUwbF620zoOQbYsWtYdGepAa_MUAs5Z9OpIdSBL_22T6c5IcRh3JCFBGd8Ct9G-N9ISWSwOhB0boCzzsNo-7Tn4hoXsgu5okBwD4ZrfogWSvkJAsOBCaDZZ1PQNtRm2NUjOqNRYeiMgbNL6-TNmyGgitCa3lXeIlMCgTsYwUtSTFK5DiD2jmxUQF5p-c9HPRvwxlxjSm_U6H0P0SGWujdcC9vTWjVF7qN-6_OkLLwCCoFSI_AGULN5dbBY1F9rH0I7vXefXrzaIfrQ6PWdFouR3zdOo_NFzL7Xqbw0X6E5Btsv6Rwbpvehl86EvsothjHWq0zKP71i_M9SQ5bP6q6-Qtn9rpX1q9OU_170Mky67PbGlM8_fFWAGDt0CK2goZ9IKS0IMtw0E8w6WPOPc_cOwayaTpnfFq7NIY5AH_xbW_233Ov_ldvOY-Hu_NGe_i8ft4TIgFhf2zcFPNc7h3jXF3kn5756-Q3ekehCmjLulSEt4Pk5uYUpbBdqurfKCxTJg_dtMxTJsAC7Ni52B0MsyDbRIrJ9skDKoOPWzzMIU4cziPrGH0cGEOvtCaeBtGH3DxBAKOKYtMfM5N7eYA_sT8FMWZuIhZLYpRC6b9ZmL1E7nRYqIF3IxKwDgnbhXrHLDWo12k8wZOTAs1eZFtsryI0w2c7kdgKrXzyESIb3EtDLqI46YfmJXO6JlJjT5k4kYgXCIRZcwHONOjPwWplPwIKeWTsTxds__o1Pn324a9NcVkDF3BrrEt7H7ui3cO3zrqln4lqkzssz1bYZXu8iQrkrSkq1NFi2zHd63guUAuUlG0RVZikezEPs_TIlvJiia0SCjNKM22lG4Knok82fGyoVmz33GSJ9gzqTZKnfuNsd1KOjdildIyz9OVYg0qFz-LlGq8QLQSSsNX0lYBtG7GzpE8UdJ59yuMl15h9XNszOBlz1QUvkMNUp-NOgfBo2Kr0arq5P0QZadHQo-d9Kex2XDTE3oMAefHerDmHbkn9BhpOEKPM89zRf8JAAD__4mFO9o">