<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/62688>62688</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Vector bswap should be combined to shufflevector 
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            llvm:instcombine,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          chfast
      </td>
    </tr>
</table>

<pre>
    The motivation is a C++ code like this

```cpp
using uint64_t = unsigned long;

struct U256
{
    uint64_t w[4];
};

U256 bs(const U256& x)
{
    U256 o;
    o.w[3] = __builtin_bswap64(x.w[0]);
    o.w[2] = __builtin_bswap64(x.w[1]);
 o.w[1] = __builtin_bswap64(x.w[2]);
    o.w[0] = __builtin_bswap64(x.w[3]);
    return o;
}
```

A simplified IR representation is something like

```llvm
define <4 x i64> @bswap256(<4 x i64> %x) local_unnamed_addr #0 {
  %a = tail call <4 x i64> @llvm.bswap.v4i64(<4 x i64> %x)
  %b = shufflevector <4 x i64> %a, <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  ret <4 x i64> %b
}
```

I think on x86 a single/double `pshufb`  should be enough to implement all of it, but we get

```asm
.LCPI0_0:
 .byte   7                               # 0x7
        .byte   6 # 0x6
        .byte   5 # 0x5
        .byte   4                               # 0x4
 .byte   3                               # 0x3
        .byte   2 # 0x2
        .byte   1                               # 0x1
        .byte   0                               # 0x0
        .byte 15                              # 0xf
        .byte   14 # 0xe
        .byte   13                              # 0xd
        .byte   12                              # 0xc
        .byte 11                              # 0xb
        .byte   10 # 0xa
        .byte   9                               # 0x9
        .byte   8                               # 0x8
bswap256: # @bswap256
        vmovdqa xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8]
 vpshufb xmm2, xmm0, xmm1
        vextractf128    xmm0, ymm0, 1
 vpshufb xmm0, xmm0, xmm1
        vinsertf128     ymm0, ymm0, xmm2, 1
 vshufpd ymm0, ymm0, ymm0, 5             # ymm0 = ymm0[1,0,3,2]
 ret
```

https://godbolt.org/z/ahKdM6GGr

Maybe this can be improved by combining `@llvm.bswap.v4i64` into `shufflevector <32 x i8`?

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVlGP4jYQ_jXmZXTIGSchPPCwwHFatSdVVdtX5MQOcS-x09hh2fv1lZ0sy0L2AK3Wkb_5vhnPODPh1qqDlnJFkjVJtjPeu8p0q6IquXWz3IjX1V-VhMY4deROGQ3KAocNwTXBNRRGSKjVDwmuUpbQLaFP4_-UDn9F2w47vVX6AL3SLo33DgjbQq-DdwG10QfC1pcC1nV94eBvTNJxezHiAPAu80KSdUyS7Tt7sb1S8hKQW4JZYbQdJTGFE8HlhHQwN2cNv2Xm3g0jyTaEvd_nvaqd0vvcvvA2jQlmp2BCfSS4nCDjfXJ0TTbn7TtM_NQtvU9mE-ROur7TF1nwWf1Y18sMP4FVTVurUkkBz39CJ9tOWqnd-cpY00hX-Qvgb8vkRanrYzNsCVkqLYGwTQwnUGlM2FcgMQ2Rh-JlVxgmvphQm4LX-15r3kix50J0QJBRuKgvwYSHhDiuaih4Xd_68ZHMg7P5MVYhVZP-LjTzoGmrvixreZSFM921Liac4OZqtzXKGn25zzBYs41iCMwj_gHfHqK3B0rY17cAOuluveWPVO7Zv7j6BxgNpywFDv4lrSXBnTB9XksgKW39sXKSUgBbmb4WkEuQ2vSHCpwBX3nZSO3AJ9OUoJyPMe8dvEg4SDdZbW7HYs9_3_zxTPeUsNEE5vmrkwCwgF__CDKgp8X7rfW_N3I6wuk0nIxwMg3HD7mOrwJmD7HYtEscYZyGo4e0o2kyfYhMp8hR8gi1_CToeMTlJ_idjA1k8QkZHyEXk4e6k82Bmn_il444n8aXv9YeyctpcvYQORvI537IngJw2SI_qB8bcxT_cTg1Tegfp6Z5MZ2A1nVAknWnWvDT_Pwq-pmBLJiHvkaS9YLgJiW4SQhuYoIb35l8U_J61K8eiDwSeSgKmAcjjy4JbjI_Z4awjkNL8Q5wjIeO69UNPsqT63jhyghDat4sX8c1ulWkdxSVtrI7C56VXt8Z-EHZC7fixu5tTW7q45GQtmDiB_iQoiFl5yR058441Zgr51rreyLuCO4ORuSmdnPTHQjufhLc8eo38T399q27JH3nr_nwKQYF175Nq6btzFEKyF-hME2utJ_B3tfEmEspKO2Mh28mGUM_XDLPZLvB20ysmFiyJZ_JVZRmbImLKMJZtRIRZVSINMZkWSQyLoo4WeRSxMWySGJMZ2qFFBlNIsSIRQnOmeRlVsZFSXnGylKQmMqGq3oeYjTdYaas7eUqxTTLZjXPZW3DJyti-GxgT0pbNxxQEvQFJIiNslaKL6Z1qlE_w7eIx5LtrFt52pe8P1ifCGWdfXfllKvl6p_h6CE_F3Nv9CH85PuYpFnf1aursilX9fm8MA3BXQh0WL60nflXFo7gLpzLEtyFo_0fAAD__yg3-Xo">