<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/62688>62688</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Vector bswap should be combined to shufflevector
</td>
</tr>
<tr>
<th>Labels</th>
<td>
llvm:instcombine,
missed-optimization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
chfast
</td>
</tr>
</table>
<pre>
The motivation is a C++ code like this
```cpp
using uint64_t = unsigned long;
struct U256
{
uint64_t w[4];
};
U256 bs(const U256& x)
{
U256 o;
o.w[3] = __builtin_bswap64(x.w[0]);
o.w[2] = __builtin_bswap64(x.w[1]);
o.w[1] = __builtin_bswap64(x.w[2]);
o.w[0] = __builtin_bswap64(x.w[3]);
return o;
}
```
A simplified IR representation is something like
```llvm
define <4 x i64> @bswap256(<4 x i64> %x) local_unnamed_addr #0 {
%a = tail call <4 x i64> @llvm.bswap.v4i64(<4 x i64> %x)
%b = shufflevector <4 x i64> %a, <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
ret <4 x i64> %b
}
```
I think on x86 a single/double `pshufb` should be enough to implement all of it, but we get
```asm
.LCPI0_0:
.byte 7 # 0x7
.byte 6 # 0x6
.byte 5 # 0x5
.byte 4 # 0x4
.byte 3 # 0x3
.byte 2 # 0x2
.byte 1 # 0x1
.byte 0 # 0x0
.byte 15 # 0xf
.byte 14 # 0xe
.byte 13 # 0xd
.byte 12 # 0xc
.byte 11 # 0xb
.byte 10 # 0xa
.byte 9 # 0x9
.byte 8 # 0x8
bswap256: # @bswap256
vmovdqa xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8]
vpshufb xmm2, xmm0, xmm1
vextractf128 xmm0, ymm0, 1
vpshufb xmm0, xmm0, xmm1
vinsertf128 ymm0, ymm0, xmm2, 1
vshufpd ymm0, ymm0, ymm0, 5 # ymm0 = ymm0[1,0,3,2]
ret
```
https://godbolt.org/z/ahKdM6GGr
Maybe this can be improved by combining `@llvm.bswap.v4i64` into `shufflevector <32 x i8`?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVlGP4jYQ_jXmZXTIGSchPPCwwHFatSdVVdtX5MQOcS-x09hh2fv1lZ0sy0L2AK3Wkb_5vhnPODPh1qqDlnJFkjVJtjPeu8p0q6IquXWz3IjX1V-VhMY4deROGQ3KAocNwTXBNRRGSKjVDwmuUpbQLaFP4_-UDn9F2w47vVX6AL3SLo33DgjbQq-DdwG10QfC1pcC1nV94eBvTNJxezHiAPAu80KSdUyS7Tt7sb1S8hKQW4JZYbQdJTGFE8HlhHQwN2cNv2Xm3g0jyTaEvd_nvaqd0vvcvvA2jQlmp2BCfSS4nCDjfXJ0TTbn7TtM_NQtvU9mE-ROur7TF1nwWf1Y18sMP4FVTVurUkkBz39CJ9tOWqnd-cpY00hX-Qvgb8vkRanrYzNsCVkqLYGwTQwnUGlM2FcgMQ2Rh-JlVxgmvphQm4LX-15r3kix50J0QJBRuKgvwYSHhDiuaih4Xd_68ZHMg7P5MVYhVZP-LjTzoGmrvixreZSFM921Liac4OZqtzXKGn25zzBYs41iCMwj_gHfHqK3B0rY17cAOuluveWPVO7Zv7j6BxgNpywFDv4lrSXBnTB9XksgKW39sXKSUgBbmb4WkEuQ2vSHCpwBX3nZSO3AJ9OUoJyPMe8dvEg4SDdZbW7HYs9_3_zxTPeUsNEE5vmrkwCwgF__CDKgp8X7rfW_N3I6wuk0nIxwMg3HD7mOrwJmD7HYtEscYZyGo4e0o2kyfYhMp8hR8gi1_CToeMTlJ_idjA1k8QkZHyEXk4e6k82Bmn_il444n8aXv9YeyctpcvYQORvI537IngJw2SI_qB8bcxT_cTg1Tegfp6Z5MZ2A1nVAknWnWvDT_Pwq-pmBLJiHvkaS9YLgJiW4SQhuYoIb35l8U_J61K8eiDwSeSgKmAcjjy4JbjI_Z4awjkNL8Q5wjIeO69UNPsqT63jhyghDat4sX8c1ulWkdxSVtrI7C56VXt8Z-EHZC7fixu5tTW7q45GQtmDiB_iQoiFl5yR058441Zgr51rreyLuCO4ORuSmdnPTHQjufhLc8eo38T399q27JH3nr_nwKQYF175Nq6btzFEKyF-hME2utJ_B3tfEmEspKO2Mh28mGUM_XDLPZLvB20ysmFiyJZ_JVZRmbImLKMJZtRIRZVSINMZkWSQyLoo4WeRSxMWySGJMZ2qFFBlNIsSIRQnOmeRlVsZFSXnGylKQmMqGq3oeYjTdYaas7eUqxTTLZjXPZW3DJyti-GxgT0pbNxxQEvQFJIiNslaKL6Z1qlE_w7eIx5LtrFt52pe8P1ifCGWdfXfllKvl6p_h6CE_F3Nv9CH85PuYpFnf1aursilX9fm8MA3BXQh0WL60nflXFo7gLpzLEtyFo_0fAAD__yg3-Xo">