<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/68311>68311</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Clang/LLVM won't generate avx-512 moves even when using intrinsic
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:X86,
            llvm:codegen,
            performance
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          danilaml
      </td>
    </tr>
</table>

<pre>
    For the following C/C++ code:
```cpp
#include <immintrin.h>

typedef long long8 __attribute__((vector_size(8 * sizeof(long))));

void bar(long8 *a) {
  long8 val = {-1, -1, -1, -1, -1, -1, -1, -1};
  a[0] = val;
  a[1] = val;
  return;
}

void baz(long8 *a) {
 long8 val = {-1, -1, -1, -1, -1, -1, -1, -1};
  _mm512_store_epi64(a, val);
  _mm512_store_epi64(a+1, val); // comment to generate zmm move
 return;
}
```
Clang with `-O3 -mcpu=icelake-server -force-vector-width=512 -print-after-all -debug`
would generate the following assembly:
```asm
bar(long __vector(8)*): # @bar(long __vector(8)*)
        vpcmpeqd        %ymm0, %ymm0, %ymm0
        vmovdqa %ymm0, 96(%rdi)
        vmovdqa %ymm0, 64(%rdi)
        vmovdqa %ymm0, 32(%rdi)
        vmovdqa %ymm0, (%rdi)
        vzeroupper
        retq
baz(long __vector(8)*): # @baz(long __vector(8)*)
        vpcmpeqd %ymm0, %ymm0, %ymm0
        vmovdqa %ymm0, 96(%rdi)
        vmovdqa %ymm0, 64(%rdi)
        vmovdqa %ymm0, 32(%rdi)
        vmovdqa %ymm0, (%rdi)
        vzeroupper
        retq
```

Godbolt link: https://godbolt.org/z/7fTPjfff8

But if you comment the second store in the `baz` the zmm store is generated. Looks like when there are several consecutive stores, they get folded into `memset` intrinsic on IR, which is then expanded during selection dag creation using preferred vector width, and not actually requested one. But even ignoring that, I'm pretty sure that using zmm stores in this case is more efficient that ymm even on targets that prefer 256 bit vectors. Especially if the dest is aligned by 64 bytes.

Somewhat related to https://github.com/llvm/llvm-project/issues/42585

Perhaps changing this line to `Subtarget.useAVX512Regs()`?
https://github.com/llvm/llvm-project/blob/ca611affd3e5dfe00e6ebe0488994bf93c2d135c/llvm/lib/Target/X86/X86ISelLoweringCall.cpp#L285
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsVk1v2zgT_jX0ZWBDoixZOvjg2PWLAnmxRVsUvRkUOZLYUKRCUnadX78g5SROm34t9riBwJicj2f4zAxJ5pxsNeKa5Dck383Y6Dtj14JpqVivZrUR5_XeWPAdQmOUMiepW9gSut8SekPoDXAjkGQbkuxIsiFFMn18GC4rNJOaq1EgkGwr-15qb6VedCR7c9GIoz8PKLABZXQbhxIOB-a9lfXo8XAgtCS0PCL3xh6cfEBCyxII3UCYmIbQMlgRWj1_2c01wtFIATWzF81ozAitgKwuenABPjIFJNsFwTwldAu_Oa52T5AAjOQ3Ccl30dORqW9E6esii360-jny1e6VLTz8ZAv_4g4OfZ-n9OC8sXjAQRZLQksWNEPMV_z-WPUmfaEOhO4J3QM3fY_agzfQokbLPMJD30Nvjnhx-UMiHitsmm4V0y2cpO-AFMn8rwzmPR9Gku0kR8XucO7QHtHCvDGW43wqoPlJCt-RbJenFOaDldrPWePRzplSMBdYj-0TxMmMSjzH-bITmHPY1-r8fQcw108rzyUHh8OEH2o31ugm8rIBQjMgy-SXqhe6p7_jwPsB78XjnND83PdJYPyVny9Ne3MU9-xarypii-VWyO-RvlePGf599Yz-kfpPlB_QmnEY0L5ct-jvH_l--H2-f6H6Ot__Ef1aK07j_4yojfKgpL4LTHfeDy50R-z8dpIujG0J3T8Qul81H999aZqmvHZyM3qQDZzN-HxUdAgOudEC4ikDUsc1UsQsFkmchUPkInZPLSsWcGvMnQMl7xBOHUZLi8Bs8HlEyxRwox3y0csjTh5coMd3eIYWfeh4gQKk9iZA9tg79AF1us6c5GA0vH0fjE6d5F0IwAco_DowHWzFaMOJ4VAh99JoEKwFbpHFyeiCcLDYoLUoYKpGmA4qugWmBWjjgXE_MqXOYPF-ROdRgNG4gEAZHlGDbLWJQL5jPli-JXTVB8_en8GNFqPkAvhEmJsIlQ44c5G-PtCITSO5nBLAPJz7fkIxGjyzLXo3CabAgeYF1NJfoncLeOMG5DIGLJuYIoHOB_dMhZeHgPoMxRLqs0e3uK6BD6bHU3BtUYUkhsvim2KSvhvrBTc9oXuljo__5oM1X5B7QvfSuTFkcr-keZlfu3-HtmODA94x3U50yVAhGmFK8Yexnna4GB1uPn3OU_oeWxcbpgpFn-0nT38cVK1MTeiesyJNWdOIDHPRYJJggTUmy7KsqmXdVBmnIs1yfuVHBruPMSpC95_LYhrffkB1a04Y0r5lSi3C64tmt7TMZ2KdiSqr2AzXaVEVebqssuWsWy85z6qVSFNaNzUi5qxmxVKIJGE8L8pmJtc0oVmaJHm6TMpltShyWle4YoUQOc-Rk2WCPZNqEaILHT2LbK-LMkvTmWI1KhfflZTWjN-hFiTbxJgpoVtCadxVtgkPyBb10_KAtjG2Z5pjWMt3M7uODNZj68gyUdJ59wzqpVe4ju8AQve3t5_-DyejCV355zubHb_Ow1Uf3hduKuB4DEw98NTCs9Gq9T-vsbjxvwMAAP__-qZbDQ">