<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/126615>126615</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[x86] Unneccessary mov in loop over bits
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
TellowKrinkle
</td>
</tr>
</table>
<pre>
When compiling the following code, clang generates the following assembly: [godbolt link](https://gcc.godbolt.org/z/onf3hEfq4)
```cpp
bool test(int num, bool* arr) {
for (; num; num &= num - 1) {
if (arr[__builtin_ctz(num)])
return true;
}
return false;
}
```
```asm
test(int, bool*):
test edi, edi
je .LBB0_1
.LBB0_2:
rep bsf eax, edi
movzx eax, byte ptr [rsi + rax]
test al, al
jne .LBB0_4
lea ecx, [rdi - 1]
and ecx, edi
mov edi, ecx
jne .LBB0_2
.LBB0_4:
ret
.LBB0_1:
xor eax, eax
ret
```
There's two issues with this:
1. There's no dependency breaking xor on the bsf output. This seems like it could be #33216, but when compiling with `-march=haswell`, clang *does* add the dependency breaking xor before the tzcnt, making it seem like that issue was fixed for this situation, but needs to be extended to bsf or applied to the generic processor profile, since it affects all pre-haswell processors (which decode `rep bsf` as `bsf`, not `tzcnt`).
2. At the end, clang does `and ecx, edi; mov edi, ecx` instead of just `and edi, ecx`. This adds an extra cycle to the critical path (ignoring ①) for all processors without mov elimination (including ice lake processors, since they got it disabled due to an erratum).
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx0VU2T4jgP_jXmomoqcfg8cAgvw-Xd42ztscuxFeJpY2dtpYH-9Vty6CZM11BU2bG-Hj2SbJWSPXnEnVjuxfIwUwN1Ie5-onPh8v9o_ZvDWRPMbfdPhx50OPfWWX8C6hDawFr8pYNBIf8H2il_ghN6jIow_aalUsJz426iqkEs96dgmuAInPVvYnkQctMR9UlUtZBHIY8nred3nXmIJyGPH0Ieg2-r7kf770LIrShqsSrGv-57UdRNCA4IEwm5sZ7AD2fGxcdC1qBiFHILYr0XRQ0A0IYIQm5Etc-q4wJCrkR1yNsXKJ8t-GdbNmJny_3razNYR9a_avoQcpMjbnM-24kJ_yLSED1QHFBUn_7E-nDf3cWtcukuH2VfKU73Kp1FUT8ynWTJgat6Epu1eEVjWY2Xh_AXjuv8r_2-eC1FUY87-ewjYn_fNakFVNfvns7h_ePKYUZhcyOEniKXOiYLQu4hqisz8x2acmyi3BSYxwmwxUTiUOUVdQ7E_o3NlXryrbyZqn0DC1NO9PWPoeUXJ4vfOaEvUfksuoY4ur8Tpa7f7J7KWtQ_O4wo5DoBXQLYlAZMcLHUAXU2jd7LOTzUfACDPXqDXt-giajeeMg4cvB58rhSYaB-oDnAz84mSIjnBM6-IVgCHQZnoEEQsqoqWa5y2QaCy_OsZxRiVbycVdSdqA6dShd0jrF_Db2QtQmY8pQZk-P_CV6DbYiYVehDj717HsWWMsYRInWKRibgohK09oomTyzlXCwNimzwn6g9oklAgTPCK3Fokz-Zhgiq750dTzhyvqSshj4GjSmFyLvWunyPJet1pki1LWpKoJyDPuLLPfOHVeKb4NJZ3YFBvgaZKB6WJrViVYBKfDB-sGcfiA_GvPloOxdFLedQU4aF3jw4ZUJZmzt50sXVPrfvpHVXBVifCJWB0MKvIdGX2VTpsw2UMQmUZ5aiAn3TDj9p0dGS1cpBr7jocmNPPkQujfghxbYUNWPOVVDPPHCXhIFGaM6erc_VyT68doPJ9dUITr3hxPDBN3V4g1MgJt7YpBqHBsyQsTHaGBXl23U-M7vKbKutmuGuXFdbKdeLxWbW7bApdKE2pSn0siq2q3KxXWKzKiuDy0qu1czuZCGXhSwLKReLYjMvFo3ZrhfFaqOqdbNGsSjwrKybO_d-5ndnljtwV8rVqlzOnGrQpfxcSunxMvankJJfz7hjo5dmOCWxKJxNlB5uyJLL7-x1sxLLA_ztPWrmQMVbJs16cCH0EN4xQmMpzYbodr-9iZa6oZnrcBbyyK7vy0sfwy_UJORxvDqEPN4Rv-_kfwEAAP__xOxoLA">