<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/113400>113400</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AVX-512] Merge `vmovdq/vpblendm` on `xmm` registers into previous maskable instruction operating on `zmm` registers 
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Validark
      </td>
    </tr>
</table>

<pre>
    In my real code, I have situation where I use `vpermb` but am ultimately only interested in the first 16 bytes of the result. I then want to blend this vector with another vector based on a mask I produce with that vector.

Here is what is emitted for Zen 5:

```asm
foo:
        vpcmpnleub      k1, xmm0, xmm1
 vpermb  zmm1, zmm3, zmm2
        vmovdqu8        xmm0 {k1}, xmm1
```

I think it would be better if we folded the `vmovdqu8` into the `vpermb`.

```asm
foo:
        vpcmpnleub      k1, xmm0, xmm1
        vpermb  zmm0 {k1}, zmm3, zmm2
        ; just use xmm0 from this point on
```

Here is a minimal repro in Zig that produces the original assembly:

```zig
const std = @import("std");

export fn foo(a: @Vector(16, u8), b: @Vector(16, u8), table: @Vector(64, u8), indices: @Vector(64, u8)) @Vector(16, u8) {
 return std.simd.extract(vperm(table, indices, padWithUndefineds(@Vector(64, u8), a), @as(u16, @bitCast(a > b))), 0, 16);
}

fn vperm(table: @Vector(64, u8), indices: @Vector(64, u8), fallback: @Vector(64, u8), mask: u64) @Vector(64, u8) {
    return struct {
        extern fn @"llvm.x86.avx512.mask.permvar.qi.512"(@Vector(64, u8), @Vector(64, u8), @Vector(64, u8), u64) @Vector(64, u8);
 }.@"llvm.x86.avx512.mask.permvar.qi.512"(table, indices, fallback, mask);
}

fn padWithUndefineds(T: type, value: anytype) T {
 const padding_len = @typeInfo(T).vector.len - @typeInfo(@TypeOf(value)).vector.len;
    return if (padding_len == 0) value else std.simd.join(value, @as(@Vector(padding_len, u8), @splat(undefined)));
}
```

LLVM version:

```llvm
define dso_local <16 x i8> @foo(<16 x i8> %0, <16 x i8> %1, <64 x i8> %2, <64 x i8> %3) local_unnamed_addr {
Entry:
  %4 = icmp ugt <16 x i8> %0, %1
  %5 = tail call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %2, <64 x i8> %3)
  %6 = shufflevector <64 x i8> %5, <64 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %7 = select <16 x i1> %4, <16 x i8> %6, <16 x i8> %0
  ret <16 x i8> %7
}

declare <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8>, <64 x i8>) #1
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0V81u27oSfhp6M4ggUT-2F14kdoMboMXd9OYedBNQ4shmQ5EqSTl2n_6AlGQrjpOipzhGEFHD4fx83wxJMWvFViGuSH5H8s2MdW6nzeqRScGZeZ6Vmh9XDwqaIxhkEirNkdA1PMCO7RGscB1zQit42aFBeIDOIpAi3rdompIUMZSdA9ZAJ51omEN5BK3kEYRyaNA65CAUuB1CLYx1kBRQHh1a0HWQGrSddBE8-DcFL0w5cBpKiYqD2wkLe6ycNvAi3A6Y0m6HZpSVzCIHrYBBw-wzPEBrNO8q7LXdjrlBNSLxhsS3_f__-FSEhRc_LyxgI5wPtNYGvqGCnKS3U31SxP0fs00vqbU-6cDw27dV0yqJXdm_PyceyEPTxMMzGfR77AB-ehFd-2c6POmFyUbv-Y9uMb57Y0Dmd88JmW9eWz3FOA3cgyrUMwgHL7qTHEqEEp1DA6KGF4RaS448EOFJHdx5WoVy-iQfyY7-RVRO-idwXqf6PkokvYPvnXWhNgNEtdFNXzytFsqBVh-ANFYDg0Yo0TAJBlujfdl-E9u-iIaysgERbcRWKCaBWYtNKY_vlctPse0llVbWgXUcSLoBksWiabVxhC4IpdZxQimhS5LeTc3gwetArcDjSheMpLd-7WMoaEIXSeHx6BZ-KV1D-Yt5x0qJlzpF9kpHKC4qtB9qLd9z4ukamDHoOqN8xpEVDY_w4AyrfMKBX0IXfTQTl3QNLeP_F273P8WxFgq59QC9HywbniSLmVft-lhIFpfCrZn17hiQ9BOUfeCDfqg8rztBfL6ZQl8reB3nn6O2hppJWbLq-RdqfiPzKp2XL9_TnGANcIbbdJV7PeV_eHBolK8kksWEUin3TXRYFBHbH_KERt5l5PPdMxP9EFGe0FCRH4H_z6Y-TupEB5D5JvrNWK8V1AnyEdcPKb9Wfl89Fe7YBst7JrtQC0wde9kSvk7g7tu8ZZwLtX2SqMZ298oPqtbBIF1Gw5nkNW4u5kkWfz22-N_a90rwF-p2suSM0pl5UQOhiwvP3nnsYwx2AKXFc0d-10KdXZybaErNxN4lwbaVzPdXN4J17rC3AF_bdT9_fvwCezRWaPXe_um570W9E-BWP0ldMQkkXScFHEAsfH-TLO73yAsxzUOvv5Emg7TIplJ6VZp6BIPTp04p1iB_YpybM-2flDPHyblHaJ4F4kXVtNBt3Vv_fVQ-jvOaPKxxTEiomJSXgWTxZSdcNsHiNxKa-C2CX7vr6lricK96syZ_a6nVwmr1Ct6UBvV0LVLa77J-kIwDOg7ScZCNg3wcFONgPg4W42B5Mng2fbKdnIwnJ-vJyXySk_TTJOd5nzNKrCbsJEOy2dWaKa5K49GqwSs8z69uNRwryQz-KcNvKAk7K00vL4Qzvkr5Ml2yGa6SOV0WGZ0vitluleZpXRWLkmZxXcSIcbGMMS5ZWda0XqZ8JlY0plkS0zSOk2VWRDyPF4uiTHieU4blkmQxNkzIKESuzXYmrO1wlSRpFsczyUqUNnx6UKrwBcKs367zzcys_KKbsttan7qwzp7NOOFk-Ga5ffzrxueeb-ALmu3kmkro_b4NnwmNv69q5acOTXgxuBXWobH9NbY1uBe6s-EM8McECNWflf7LRrdomBNqO9j4eWlj1hm52jnX-hOe0HtC77fC7boyqnRD6H3YpvrHTWv0d_QXnfuQqyX0fgBjv6J_BwAA___5o8un">