<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/55066>55066</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Sub-optimal code generation of two successive shuffles
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          dzaima
      </td>
    </tr>
</table>

<pre>
    The following IR generates four shuffle instructions & a blend for AVX2, despite the two individual `shufflevector` instructions being exactly representable:
```llvm
define <8 x i32> @f(<32 x i8> %0) {
  %2 = shufflevector <32 x i8> %0, <32 x i8> poison, <32 x i32> <i32 0, i32 4, i32 8, i32 12, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 16, i32 20, i32 24, i32 28, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %3 = bitcast <32 x i8> %2 to <8 x i32>
  %4 = shufflevector <8 x i32> %3, <8 x i32> poison, <8 x i32> <i32 0, i32 4, i32 0, i32 4, i32 0, i32 4, i32 0, i32 4>
  ret <8 x i32> %4
}
```
(https://godbolt.org/z/qe6PY5Ynq)

using C intrinsics, with a comparison to GCC: https://godbolt.org/z/6o93csEea

LLVM:
```py
vpermq   ymm1, ymm0, 238                          # ymm1 = ymm0[2,3,2,3]
vpshufb  ymm1, ymm1, ymmword ptr [rip + .LCPI0_0] # ymm1 = ymm1[u,u,u,u,0,4,8,12,u,u,u,u,0,4,8,12,u,u,u,u,16,20,24,28,u,u,u,u,16,20,24,28]
vpermq   ymm0, ymm0, 68                           # ymm0 = ymm0[0,1,0,1]
vpshufb  ymm0, ymm0, ymmword ptr [rip + .LCPI0_1] # ymm0 = ymm0[0,4,8,12,u,u,u,u,0,4,8,12,u,u,u,u,16,20,24,28,u,u,u,u,16,20,24,28,u,u,u,u]
vpblendd ymm0, ymm0, ymm1, 170                    # ymm0 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
ret
```

GCC:
```asm
vmovdqa ymm1, YMMWORD PTR .LC1[rip]
vpshufb ymm0, ymm0, YMMWORD PTR .LC0[rip]
vpermd  ymm0, ymm1, ymm0
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNFtuOmzrwa8jLqBHYgZAHHnaTbVVpV11tq_bs05HBTuIjglnbJN1-fccGEkjS29NpRLBnxnO_4Fzx1-zTVsBalaU6yGoD759gIyqhmRUG0Y0Gs23W61KArIzVTWGlqgwEJAEGeSkqjqc03Hz-hwRkCVyYWloBFoXag0ImLveSN6yEIAk7UXtRWKURHsvMhTNAfGWFLV9Bi1oLIyrLUEtAb4JwFYQ3yNQ-ZbnftSgu1rISENBlCl9BUhLQOwhm4TogKSIpcdjUI0kcBmQBwfy2ZQWHIsi6gpFpcI1veYatlTSqGuE73XSJO_AsbjPrN2m_iUi_ayo0_y8FoqTfkaMv5OgMSf8eS68A9G6QYupTnEtbMGMvk0vAqnH9DHhn18tjWGuooCuDAXZUHulvVMcfYk5GamEvLZp1_TJfnTVOB5J0a21tXGORt_hsFM9VaadKbxD6hv8XkTw-x8_VC7ZMx-TfjXFtimZUVmP_ysI4ow7SbnEiFGpXM-08dzF9t1yiAviVpkQtaGHuBBuqub___HDZ9vVri9jXQu9eAOB1t4ucflx9cAhN4Ye_gFDP4FPqOeJb14oufe0ar3r5LuP5SH6_HpTmUFssg_hWyxql3sL0fvn4PvwXBa4utER4rkHe4d_Z6rLqmshPgz8-4LvTN6bvSd-Ovz5z8u8Uv3AYv-Qn4es9C4fxc1xRZ3B0NX4j-T-PXzSI34WW_yNe52dO_vlvH7_mn6-TaB7-fgRRKll2lRIdobY6hzQ6os1GtHhES0a0-dFunBXX54F_t_16doCZ7kO736k9f2FHF58fHr58eFrB46cnl7-oTedFCZxH6IwtvGDDyuRwEVAP_8CHCc8oX9AFm1hpS5F9bPI3qrZyh9eOQnHRX2nwmgFq7S8mpikKYYzci362m0mjy-xsVuFUa_IpTjUE_JWjXd7UWv2H3wIEpTGNwBH4No7DJJlssyQvclIUSUjSaB5zEbGQJbSYL0QR5fMZn5QsF6XJ0O-AkEocwIvAPcZgIjMSEhLOyCxahCEl05iklHC2ZiReRyJmeLMROybLqbPDDdGJzrxJebMxSCylseZEZOjjphLCq0P5rLFbpTP-jWF0Jl5z5i3_DsFAvnE">