<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/55066>55066</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Sub-optimal code generation of two successive shuffles
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dzaima
</td>
</tr>
</table>
<pre>
The following IR generates four shuffle instructions & a blend for AVX2, despite the two individual `shufflevector` instructions being exactly representable:
```llvm
define <8 x i32> @f(<32 x i8> %0) {
%2 = shufflevector <32 x i8> %0, <32 x i8> poison, <32 x i32> <i32 0, i32 4, i32 8, i32 12, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 16, i32 20, i32 24, i32 28, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%3 = bitcast <32 x i8> %2 to <8 x i32>
%4 = shufflevector <8 x i32> %3, <8 x i32> poison, <8 x i32> <i32 0, i32 4, i32 0, i32 4, i32 0, i32 4, i32 0, i32 4>
ret <8 x i32> %4
}
```
(https://godbolt.org/z/qe6PY5Ynq)
using C intrinsics, with a comparison to GCC: https://godbolt.org/z/6o93csEea
LLVM:
```py
vpermq ymm1, ymm0, 238 # ymm1 = ymm0[2,3,2,3]
vpshufb ymm1, ymm1, ymmword ptr [rip + .LCPI0_0] # ymm1 = ymm1[u,u,u,u,0,4,8,12,u,u,u,u,0,4,8,12,u,u,u,u,16,20,24,28,u,u,u,u,16,20,24,28]
vpermq ymm0, ymm0, 68 # ymm0 = ymm0[0,1,0,1]
vpshufb ymm0, ymm0, ymmword ptr [rip + .LCPI0_1] # ymm0 = ymm0[0,4,8,12,u,u,u,u,0,4,8,12,u,u,u,u,16,20,24,28,u,u,u,u,16,20,24,28,u,u,u,u]
vpblendd ymm0, ymm0, ymm1, 170 # ymm0 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
ret
```
GCC:
```asm
vmovdqa ymm1, YMMWORD PTR .LC1[rip]
vpshufb ymm0, ymm0, YMMWORD PTR .LC0[rip]
vpermd ymm0, ymm1, ymm0
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNFtuOmzrwa8jLqBHYgZAHHnaTbVVpV11tq_bs05HBTuIjglnbJN1-fccGEkjS29NpRLBnxnO_4Fzx1-zTVsBalaU6yGoD759gIyqhmRUG0Y0Gs23W61KArIzVTWGlqgwEJAEGeSkqjqc03Hz-hwRkCVyYWloBFoXag0ImLveSN6yEIAk7UXtRWKURHsvMhTNAfGWFLV9Bi1oLIyrLUEtAb4JwFYQ3yNQ-ZbnftSgu1rISENBlCl9BUhLQOwhm4TogKSIpcdjUI0kcBmQBwfy2ZQWHIsi6gpFpcI1veYatlTSqGuE73XSJO_AsbjPrN2m_iUi_ayo0_y8FoqTfkaMv5OgMSf8eS68A9G6QYupTnEtbMGMvk0vAqnH9DHhn18tjWGuooCuDAXZUHulvVMcfYk5GamEvLZp1_TJfnTVOB5J0a21tXGORt_hsFM9VaadKbxD6hv8XkTw-x8_VC7ZMx-TfjXFtimZUVmP_ysI4ow7SbnEiFGpXM-08dzF9t1yiAviVpkQtaGHuBBuqub___HDZ9vVri9jXQu9eAOB1t4ucflx9cAhN4Ye_gFDP4FPqOeJb14oufe0ar3r5LuP5SH6_HpTmUFssg_hWyxql3sL0fvn4PvwXBa4utER4rkHe4d_Z6rLqmshPgz8-4LvTN6bvSd-Ovz5z8u8Uv3AYv-Qn4es9C4fxc1xRZ3B0NX4j-T-PXzSI34WW_yNe52dO_vlvH7_mn6-TaB7-fgRRKll2lRIdobY6hzQ6os1GtHhES0a0-dFunBXX54F_t_16doCZ7kO736k9f2FHF58fHr58eFrB46cnl7-oTedFCZxH6IwtvGDDyuRwEVAP_8CHCc8oX9AFm1hpS5F9bPI3qrZyh9eOQnHRX2nwmgFq7S8mpikKYYzci362m0mjy-xsVuFUa_IpTjUE_JWjXd7UWv2H3wIEpTGNwBH4No7DJJlssyQvclIUSUjSaB5zEbGQJbSYL0QR5fMZn5QsF6XJ0O-AkEocwIvAPcZgIjMSEhLOyCxahCEl05iklHC2ZiReRyJmeLMROybLqbPDDdGJzrxJebMxSCylseZEZOjjphLCq0P5rLFbpTP-jWF0Jl5z5i3_DsFAvnE">