<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/141347>141347</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[X86] `const << (x&7)` doesn't use `shlx` when BMI2 is available
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dzaima
</td>
</tr>
</table>
<pre>
These functions:
```c
void shl_u8(uint8_t* dst, uint64_t c) {
*dst = 1 << (c&7);
}
void shr_u8(uint8_t* dst, uint64_t c) {
*dst = 0xaa >> (c&7);
}
```
compiled with `-O3 -march=haswell` produce:
```asm
shl_u8:
mov rcx, rsi
and cl, 7
mov al, 1
shl al, cl
mov byte ptr [rdi], al
ret
shr_u8:
mov rcx, rsi
and cl, 7
mov al, -86
shr al, cl
mov byte ptr [rdi], al
ret
```
but they could use `shlx` & `shrx` as gcc does, e.g.:
```asm
shl_u8:
and esi, 7
mov eax, 1
shlx esi, eax, esi
mov BYTE PTR [rdi], sil
ret
```
Extra important in a loop, where clang's version ends up reloading the constant every iteration, whereas `shlx`/`shrx` can reuse one from outside the loop, ending up with clang taking 4 uops on Haswell, vs gcc - 1 uop per iteration.
https://godbolt.org/z/Yc57PsWKE
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysVF2L4zYU_TU3L5cEWf7Mgx8ykwktpXQpA-0-DbJ0E6uVJSPJSWZ_fZGT2cmmUyjshoDR_Tz3HOmKEPTBErVQPkC5XYgp9s636ovQg1h0Tr22zz0Fwv1kZdTOBsg3wDZQsctfAtscnVYYevMyNcCbSdvYvETgG1QhAn_EZKmKl4gS-BqhfgC2QUQEvlEhIuRbzBDyR8gfEXgjgVc18DXkKRDq7XsL_x0t2FkIhPwJ8qf_6vJ1KmAb6YZRG1J40rFHqNjytxyXg_Cyh3zbi3AiY6BiOHqnJkl3vIgwANtcSZldeP0N7jh_vTwn5D7o5BRWzVZpkrH-IF7MnuzGE3pz45HmLql7jYRj9Ajlg1caym0KE7dhnmICnYD6Hwx02VTfQPU_BuqNQt0UMfb0itJNRuEUKMkUenNOsgCvLkc_H0XAg5SoHIVUmlaH1f9W7G1kCvqjmUmcP1DmfJNxjaALg3d0PXx-fsJPz79_O3vQ8_D_nvrpHL1APYzOR2EjaosCjXNjyjr15AmlEfYAvA54JB-0s0hWBZxG9GScUNoeEnEonQ1zDTqSf0UdyYv0xr9WEuGGUeC7Gz6lsOgpce4s4d67Ad0Ug1Y0l34DRHbuNo2XZzQjwyj-TsYCJzcGdBZ_ur4m_ojHi05LzJIXR_LvuFaXu9rHOM5riO-A7w5Odc7ElfMH4LsvwHefZVl_Cn_88rRQba7W-VosqM3qomHrJiuKRd9KYqwo9nnOeJPxfVeR4Dnv1o1sun0uyoVuOeMlK3mRFcWaFat1VudCFVWdlapjWQEFo0FoszLmOKTeCx3CRG1WZHlRL4zoyIR5qXJu6YSzFzhPO9a3KWnZTYcABTM6xPBeJupo5m38Z1NBuU38zzLd7Mfz2-aq2HyfLfA63l__U08WH379maMOKI5CG9EZWkzetHf86dhP3Uq6Afguwbh-lqN3f5GMwHcz-AB8d53u2PJ_AgAA___TIsby">