<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/110846>110846</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AVX-512] Don't fold vectorized byte shift into `vgf2p8affineqb` when the results are going to be ANDed later anyway
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Validark
      </td>
    </tr>
</table>

<pre>
    This code: ([Godbolt link](https://zig.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:13,fontUsePx:'0',j:3,lang:zig,selection:(endColumn:1,endLineNumber:7,positionColumn:1,positionLineNumber:7,selectionStartColumn:1,selectionStartLineNumber:7,startColumn:1,startLineNumber:7),source:'const+std+%3D+@import(%22std%22)%3B%0Aconst+builtin+%3D+@import(%22builtin%22)%3B%0A%0Aexport+fn+expand8xu8To16xu4AsByteVector(vec:+@Vector(8,+u8))+@Vector(16,+u8)+%7B%0A++++return+std.simd.interlace(.%7B+vec,+vec+%3E%3E+@splat(4)+%7D)+%26+@as(@Vector(16,+u8),+@splat(0xF))%3B%0A%7D%0A%0Aexport+fn+expand8xu8To16xu4AsByteVector2(vec:+@Vector(8,+u8))+@Vector(16,+u8)+%7B%0A++++return+std.simd.interlace(%5B2%5D@Vector(8,+u8)%7B+vec,+@bitCast(@as(@Vector(4,+u16),+@bitCast(vec))+%3E%3E+@splat(4))+%7D)+%26+@as(@Vector(16,+u8),+@splat(0xF))%3B%0A%7D'),l:'5',n:'1',o:'Zig+source+%233',t:'0')),header:(),k:59.471365638766514,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:ztrunk,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:16,fontUsePx:'0',j:1,lang:zig,libs:!(),options:'-O+ReleaseFast+-target+x86_64-linux+-mcpu%3Dznver4',overrides:!(),selection:(endColumn:35,endLineNumber:23,positionColumn:35,positionLineNumber:23,selectionStartColumn:35,selectionStartLineNumber:23,startColumn:35,startLineNumber:23),source:3),l:'5',n:'0',o:'+zig+trunk+(Editor+%233)',t:'0')),header:(),k:40.528634361233486,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4))

```zig
export fn expand8xu8To16xu4AsByteVector(vec: @Vector(8, u8)) @Vector(16, u8) {
 return std.simd.interlace(.{ vec, vec >> @splat(4) }) & @as(@Vector(16, u8), @splat(0xF));
}
```

```llvm
define dso_local range(i8 0, 16) <16 x i8> @expand8xu8To16xu4AsByteVector(<8 x i8> %0) local_unnamed_addr {
Entry:
  %1 = lshr <8 x i8> %0, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
  %2 = shufflevector <8 x i8> %0, <8 x i8> %1, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
  %3 = and <16 x i8> %2, <i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15, i8 15>
  ret <16 x i8> %3
}
```

Compiles like so on Zen 3:

```asm
.LCPI0_0:
        .zero 16,15
expand8xu8To16xu4AsByteVector:
        vpsrlw  xmm1, xmm0, 4
 vpunpcklbw      xmm0, xmm0, xmm1
        vpand   xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        ret
```

However, on Zen 4, we get:

```asm
.LCPI0_2:
        .byte   0
        .byte   0
 .byte   0
        .byte   0
        .byte   128
        .byte 64
        .byte   32
        .byte   16
.LCPI0_3:
 .zero   4,15
expand8xu8To16xu4AsByteVector:
        vgf2p8affineqb xmm1, xmm0, qword ptr [rip + .LCPI0_2]{1to2}, 0
        vpunpcklbw xmm0, xmm0, xmm1
        vpandd  xmm0, xmm0, dword ptr [rip + .LCPI0_3]{1to4}
        ret
```

In this case, `vgf2p8affineqb` is overkill. It doesn't actually eliminate a `vpand` instruction, so it is a pointless optimization. It should compile like so:

```asm
.LCPI1_1:
        .zero 4,15
expand8xu8To16xu4AsByteVector2:
        vpsrlw  xmm1, xmm0, 4
 vpunpcklbw      xmm0, xmm0, xmm1
        vpandd  xmm0, xmm0, dword ptr [rip + .LCPI1_1]{1to4}
        ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEWFtv47gV_jX0CxFDIiVZfvCDJSftAIudRXc7KPYloKRjmxuK9JKU4-TXFyRlOfKtmUHbDQKLOnd-PBfazBi-kQALlBYoXU1YZ7dKL74xwRumXyaVat4Wv225wbVqANElRiRHafE31VRKWCy4fEHpCpF8a-3OILpE5AmRp3e-mW6CzFTpjSfSjWfHzgLJb79sEV06b48Nt0ojUnLPzddcgGQt-LdZjMgMkXKtpP21ZsJRY9oT_mngl0MQi4LYH4guHVcw6Xy98w0ipQEBteVKBvsgm1KJrnWvMSIlyOYnLuHnrq1AI7p0dnbKcKcxEjwSL6QHB79apu1IZ8y61LxUuCY3dwzV6boHpVbSWEQKYxtECkRSunLPJOLtTmnr8U0J8eyUEG8gpU4wWh51q44Ly-U9_UHk3Ib_gEOQLdbOCBx2TDb5oct_U3F26JKlKd4sfIPaH26-h9rH7rwMxByREpGiy731-Rk3zkZs53p29F8M_xpsp2UAY2p420y5tKAFqwGRfNorFc6_t-YXfs-P_YdzanaCuX0nJ1erYUmyIMSMQ-Z2hOXYVnR4Ou7rA3DO7g9ASP46DBFJ04K4z9Vtx2cooySquC2ZsQGyC-iSXtsFOL-i4y3Nh6DvHNb_4bz6EhSh-tLQa-SoQanw9rvrOEVfqyEYSoOE_dipgptyC6zxRe5hdpQXRJfpfJrMYpqlGc1nWZbGycl5Eoy1rl1E0SmMaBRGeHF9OrriubzeitsdF_ChEQ8Uuny3upMvru1yYUGHAZBXI9cVl0y_jUAJpK_VH1DbEaNWbQvSfpXibWSjgarblEwIMxJvoGVyI2Asy7VrrXswIzIcoO7seHq4bBYjKcErzfRbGabdiW41b0eqe9CVMrAKEXC5OcdyLdjGfOPw-nUH8vbMyu7OrPh8ZglemQ8n5BypnZsj_WYfviJS_AMEMANPvmKKB8v0BtzqkGfPWfIguOwOjtHWu843-Xe5B91nkNqD1ryBczd3piVNr4xLQq_NSy96dWB6-RsT02vdGZlB94rKVcnR1KR3Svi8dgp_BkWf8wUi-fGOMhT0_HtrOommKckzmtAsJpQmefbfrekPmyPfYe6amT1oE87_2GCjFYqW_WcWhX-HkaeEIYbXEn_uDoDPhwg-zi580asDC6NZEXzhMKnwjVk_K3CYQO6BEX1E9BGfDQyM_KTAiGT45ow4hlTiq_OB9uE4U2NUrkIlxL4NpAbWXAJujHoWqmYCayY3LnaeY3dS2I9DjGgZZ_iAed5v4D8hi2iZn-RJGjkr3sNzJ91lunlmTaNPQD5KG3p1gNXpxBjRFRZmq_EVc6Uj8hy7UYR_6EkfPzgj3pnZduu1gL3fxk2vI2LcEwM-lHgyLTklAUC3yI-L-LiYHxdkYA3SdCAN4slAGuTTgTTIZwNpkJ8NpHS8X-r3y2RzfrYkJSdw47RH6y9fnILXYC9jpp_J_zLcHwwW_AWwUVhJ_DtITIe8O1Nmpi-T6U_lL1-i5-hDgoa_6Tto5YukjNOh_dwpjXMD-53R4hXjQ9v60z60rU-EpBfb7zq5q19E9Rrkj_wPz_jcojvVq5KvSjd4ZzVGaaH5DiNS4GFr6WpsR4O9g-Xf1Svs_eXsiKLPuVfAbuR_Ek9yiWf1ZgFjHN0nf1LsjByT_BojS66LU3LDTDbaxCl7-mzAHoofTIfNmuxytnZd-c_qIiv-vHOCxJ3grIitIn6ilOcwfEilT2ZRc5lFzZ0I6BBBMhTi57Lpi8TW_9jDDPjuk0VjJFAWYW6wuyO-cCGm-IvFjQIjEZlZzGrbMSHeMAjecsksYOZtuE14VWms7sItkpSu9Ll19hjeKS6tAGOwu862_J05IW_fbFUnGtx_6zg2jc_ldvwc3-gVn86Ny-L4H_WK7zplt7HvOeVJs6DNnM7ZBBbxjMzy2Wye0Ml2keckTtL1jNRpOq_ipInSOayzeU1pNZsnZMIXJCJJHEUkTqI0oVOY5YyllKxJVFVxXblrSMu4mLrrzFTpzYQb08EijqM8ySaCVSCM_4GREAmv2HMRcXUy0Qun9FB1G4OSSHBjzcmM5Vb4XyaX3_71kMZOAa9USLW1Eg0O9wP-Dg32PcFs-dpiLq26nrmvW3AJDliD6YQ1mGnAG8XlBluFK8DLn1fQYMEsaMzk2yt7m3RaLMa_bW643XbVtFYtIk_-ChceDzut_Bda8uS3aBB56jHYL8i_AwAA___odcwi">