<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/126615>126615</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [x86] Unneccessary mov in loop over bits
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          TellowKrinkle
      </td>
    </tr>
</table>

<pre>
    When compiling the following code, clang generates the following assembly: [godbolt link](https://gcc.godbolt.org/z/onf3hEfq4)
```cpp
bool test(int num, bool* arr) {
    for (; num; num &= num - 1) {
        if (arr[__builtin_ctz(num)])
            return true;
    }
    return false;
}
```
```asm
test(int, bool*):
        test    edi, edi
        je      .LBB0_1
.LBB0_2:
        rep       bsf eax, edi
        movzx   eax, byte ptr [rsi + rax]
        test    al, al
        jne     .LBB0_4
        lea     ecx, [rdi - 1]
        and     ecx, edi
        mov     edi, ecx
        jne     .LBB0_2
.LBB0_4:
        ret
.LBB0_1:
        xor     eax, eax
        ret
```

There's two issues with this:
1. There's no dependency breaking xor on the bsf output.  This seems like it could be #33216, but when compiling with `-march=haswell`, clang *does* add the dependency breaking xor before the tzcnt, making it seem like that issue was fixed for this situation, but needs to be extended to bsf or applied to the generic processor profile, since it affects all pre-haswell processors (which decode `rep bsf` as `bsf`, not `tzcnt`).
2. At the end, clang does `and ecx, edi; mov edi, ecx` instead of just `and edi, ecx`.  This adds an extra cycle to the critical path (ignoring ①) for all processors without mov elimination (including ice lake processors, since they got it disabled due to an erratum).
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx0VU2T4jgP_jXmomoqcfg8cAgvw-Xd42ztscuxFeJpY2dtpYH-9Vty6CZM11BU2bG-Hj2SbJWSPXnEnVjuxfIwUwN1Ie5-onPh8v9o_ZvDWRPMbfdPhx50OPfWWX8C6hDawFr8pYNBIf8H2il_ghN6jIow_aalUsJz426iqkEs96dgmuAInPVvYnkQctMR9UlUtZBHIY8nred3nXmIJyGPH0Ieg2-r7kf770LIrShqsSrGv-57UdRNCA4IEwm5sZ7AD2fGxcdC1qBiFHILYr0XRQ0A0IYIQm5Etc-q4wJCrkR1yNsXKJ8t-GdbNmJny_3razNYR9a_avoQcpMjbnM-24kJ_yLSED1QHFBUn_7E-nDf3cWtcukuH2VfKU73Kp1FUT8ynWTJgat6Epu1eEVjWY2Xh_AXjuv8r_2-eC1FUY87-ewjYn_fNakFVNfvns7h_ePKYUZhcyOEniKXOiYLQu4hqisz8x2acmyi3BSYxwmwxUTiUOUVdQ7E_o3NlXryrbyZqn0DC1NO9PWPoeUXJ4vfOaEvUfksuoY4ur8Tpa7f7J7KWtQ_O4wo5DoBXQLYlAZMcLHUAXU2jd7LOTzUfACDPXqDXt-giajeeMg4cvB58rhSYaB-oDnAz84mSIjnBM6-IVgCHQZnoEEQsqoqWa5y2QaCy_OsZxRiVbycVdSdqA6dShd0jrF_Db2QtQmY8pQZk-P_CV6DbYiYVehDj717HsWWMsYRInWKRibgohK09oomTyzlXCwNimzwn6g9oklAgTPCK3Fokz-Zhgiq750dTzhyvqSshj4GjSmFyLvWunyPJet1pki1LWpKoJyDPuLLPfOHVeKb4NJZ3YFBvgaZKB6WJrViVYBKfDB-sGcfiA_GvPloOxdFLedQU4aF3jw4ZUJZmzt50sXVPrfvpHVXBVifCJWB0MKvIdGX2VTpsw2UMQmUZ5aiAn3TDj9p0dGS1cpBr7jocmNPPkQujfghxbYUNWPOVVDPPHCXhIFGaM6erc_VyT68doPJ9dUITr3hxPDBN3V4g1MgJt7YpBqHBsyQsTHaGBXl23U-M7vKbKutmuGuXFdbKdeLxWbW7bApdKE2pSn0siq2q3KxXWKzKiuDy0qu1czuZCGXhSwLKReLYjMvFo3ZrhfFaqOqdbNGsSjwrKybO_d-5ndnljtwV8rVqlzOnGrQpfxcSunxMvankJJfz7hjo5dmOCWxKJxNlB5uyJLL7-x1sxLLA_ztPWrmQMVbJs16cCH0EN4xQmMpzYbodr-9iZa6oZnrcBbyyK7vy0sfwy_UJORxvDqEPN4Rv-_kfwEAAP__xOxoLA">