<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/141347>141347</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86] `const << (x&7)` doesn't use `shlx` when BMI2 is available
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          dzaima
      </td>
    </tr>
</table>

<pre>
    These functions:
```c
void shl_u8(uint8_t* dst, uint64_t c) {
    *dst = 1 << (c&7);
}
void shr_u8(uint8_t* dst, uint64_t c) {
    *dst = 0xaa >> (c&7);
}
```
compiled with `-O3 -march=haswell` produce:
```asm
shl_u8:
        mov     rcx, rsi
 and     cl, 7
        mov     al, 1
        shl     al, cl
        mov byte ptr [rdi], al
        ret

shr_u8:
        mov     rcx, rsi
 and     cl, 7
        mov     al, -86
        shr     al, cl
        mov byte ptr [rdi], al
        ret
```
but they could use `shlx` & `shrx` as gcc does, e.g.:
```asm
shl_u8:
        and     esi, 7
        mov eax, 1
        shlx    esi, eax, esi
        mov     BYTE PTR [rdi], sil
 ret
```
Extra important in a loop, where clang's version ends up reloading the constant every iteration, whereas `shlx`/`shrx` can reuse one from outside the loop, ending up with clang taking 4 uops on Haswell, vs gcc - 1 uop per iteration.

https://godbolt.org/z/Yc57PsWKE
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysVF2L4zYU_TU3L5cEWf7Mgx8ykwktpXQpA-0-DbJ0E6uVJSPJSWZ_fZGT2cmmUyjshoDR_Tz3HOmKEPTBErVQPkC5XYgp9s636ovQg1h0Tr22zz0Fwv1kZdTOBsg3wDZQsctfAtscnVYYevMyNcCbSdvYvETgG1QhAn_EZKmKl4gS-BqhfgC2QUQEvlEhIuRbzBDyR8gfEXgjgVc18DXkKRDq7XsL_x0t2FkIhPwJ8qf_6vJ1KmAb6YZRG1J40rFHqNjytxyXg_Cyh3zbi3AiY6BiOHqnJkl3vIgwANtcSZldeP0N7jh_vTwn5D7o5BRWzVZpkrH-IF7MnuzGE3pz45HmLql7jYRj9Ajlg1caym0KE7dhnmICnYD6Hwx02VTfQPU_BuqNQt0UMfb0itJNRuEUKMkUenNOsgCvLkc_H0XAg5SoHIVUmlaH1f9W7G1kCvqjmUmcP1DmfJNxjaALg3d0PXx-fsJPz79_O3vQ8_D_nvrpHL1APYzOR2EjaosCjXNjyjr15AmlEfYAvA54JB-0s0hWBZxG9GScUNoeEnEonQ1zDTqSf0UdyYv0xr9WEuGGUeC7Gz6lsOgpce4s4d67Ad0Ug1Y0l34DRHbuNo2XZzQjwyj-TsYCJzcGdBZ_ur4m_ojHi05LzJIXR_LvuFaXu9rHOM5riO-A7w5Odc7ElfMH4LsvwHefZVl_Cn_88rRQba7W-VosqM3qomHrJiuKRd9KYqwo9nnOeJPxfVeR4Dnv1o1sun0uyoVuOeMlK3mRFcWaFat1VudCFVWdlapjWQEFo0FoszLmOKTeCx3CRG1WZHlRL4zoyIR5qXJu6YSzFzhPO9a3KWnZTYcABTM6xPBeJupo5m38Z1NBuU38zzLd7Mfz2-aq2HyfLfA63l__U08WH379maMOKI5CG9EZWkzetHf86dhP3Uq6Afguwbh-lqN3f5GMwHcz-AB8d53u2PJ_AgAA___TIsby">