<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/128441>128441</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Suboptimal codegen involving tzcnt
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          tavianator
      </td>
    </tr>
</table>

<pre>
    Here is a simple loop to find the index of the first different bit in two arrays:

```c
unsigned long crit_bit(unsigned long *a, unsigned long *b, unsigned long size) {
    unsigned long ret = 0;
#pragma nounroll
    for (unsigned long i = 0; i < size; ++i) {
 unsigned long diff = a[i] ^ b[i];
        int bits = 8 * sizeof(unsigned long);
        ret += __builtin_ctzg(diff, bits);
        if (diff) {
 break;
        }
    }
    return ret;
}
```

On x86-64 with `-O3 -mbmi`, the loop generates

```asm
.LBB0_4:
        mov     r9, rax
        mov     r10, qword ptr [rdi + 8*r8]
        mov     r11, qword ptr [rsi + 8*r8]
        mov     rax, r11
        xor     rax, r10
 tzcnt   rax, rax
        mov     rbx, r11
        xor     rbx, r10
 cmove   rax, rcx
        add     rax, r9
        xor     r11, r10
 jne     .LBB0_6
        lea     r9, [r8 + 1]
        cmp     rdx, r8
 mov     r8, r9
        jne     .LBB0_4
```

First, it's using `cmove` to get 64 in case the input was zero, but `tzcnt` already returns 64 in that case so that's redundant.

Second, it could use the flags after `tzcnt` to handle the `if (diff) break`, but instead it re-does the comparison.

Better code would look something like this:

```asm
 xor     rax, r10
        tzcnt   rax, rax
        lea     rax, [rax + r9]
 jc      .LBB0_6
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyMVdGOqzgM_Zr0xWoFAVr6wEOZ2WofVroP9wNGITE0MyFhk9B25utXCWxvmXavtqoEiX3s4xPHMOdkpxErUtSkeF2x0Z-MrTw7S6aZN3bVGPFZ_YkWQTpg4GQ_KARlzADeQCu1AH9CkFrgFUwbF620zoOQbYsWtYdGepAa_MUAs5Z9OpIdSBL_22T6c5IcRh3JCFBGd8Ct9G-N9ISWSwOhB0boCzzsNo-7Tn4hoXsgu5okBwD4ZrfogWSvkJAsOBCaDZZ1PQNtRm2NUjOqNRYeiMgbNL6-TNmyGgitCa3lXeIlMCgTsYwUtSTFK5DiD2jmxUQF5p-c9HPRvwxlxjSm_U6H0P0SGWujdcC9vTWjVF7qN-6_OkLLwCCoFSI_AGULN5dbBY1F9rH0I7vXefXrzaIfrQ6PWdFouR3zdOo_NFzL7Xqbw0X6E5Btsv6Rwbpvehl86EvsothjHWq0zKP71i_M9SQ5bP6q6-Qtn9rpX1q9OU_170Mky67PbGlM8_fFWAGDt0CK2goZ9IKS0IMtw0E8w6WPOPc_cOwayaTpnfFq7NIY5AH_xbW_233Ov_ldvOY-Hu_NGe_i8ft4TIgFhf2zcFPNc7h3jXF3kn5756-Q3ekehCmjLulSEt4Pk5uYUpbBdqurfKCxTJg_dtMxTJsAC7Ni52B0MsyDbRIrJ9skDKoOPWzzMIU4cziPrGH0cGEOvtCaeBtGH3DxBAKOKYtMfM5N7eYA_sT8FMWZuIhZLYpRC6b9ZmL1E7nRYqIF3IxKwDgnbhXrHLDWo12k8wZOTAs1eZFtsryI0w2c7kdgKrXzyESIb3EtDLqI46YfmJXO6JlJjT5k4kYgXCIRZcwHONOjPwWplPwIKeWTsTxds__o1Pn324a9NcVkDF3BrrEt7H7ui3cO3zrqln4lqkzssz1bYZXu8iQrkrSkq1NFi2zHd63guUAuUlG0RVZikezEPs_TIlvJiia0SCjNKM22lG4Knok82fGyoVmz33GSJ9gzqTZKnfuNsd1KOjdildIyz9OVYg0qFz-LlGq8QLQSSsNX0lYBtG7GzpE8UdJ59yuMl15h9XNszOBlz1QUvkMNUp-NOgfBo2Kr0arq5P0QZadHQo-d9Kex2XDTE3oMAefHerDmHbkn9BhpOEKPM89zRf8JAAD__4mFO9o">