<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/84182>84182</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            AVX-512-VNNI instruction not generated when `target-cpu=znver4`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:X86,
            llvm:codegen
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          bjacob
      </td>
    </tr>
</table>

<pre>
    Filing this Issue with two Compiler Explorer testcases - one in LLVM IR and the other in C.

# LLVM IR testcase

Compiler Explorer link: https://godbolt.org/z/3Wf1cfEo1

Problem: when the parent function has `"target-cpu"="znver4"`, the `@llvm.x86.avx512.vpdpwssd.512` intrinsic fails to compile to the expected AVX-512-VNNI `vpdpwssd` instruction and instead generates a (`vpmaddwd`, `vpaddd`) fallback implementation.

The problem only reproduces with `"target-cpu"="znver4"` and not with `"target-cpu"="cascadelake"` or when `"target-cpu"` is simply omitted, as shown in the Compiler Explorer link.

Inlining the LLVM IR testcase here for completeness:

```llvm
; Testing with "target-cpu"="znver4" in attributes #0 below. Nothing else changes between testcases.

define dso_local <8 x i64> @foo(<8 x i64> noundef %0, <8 x i64> noundef %1, <8 x i64> noundef %2) local_unnamed_addr #0 {
  %4 = bitcast <8 x i64> %0 to <16 x i32>
  %5 = bitcast <8 x i64> %1 to <16 x i32>
  %6 = bitcast <8 x i64> %2 to <16 x i32>
  %7 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %4, <16 x i32> %5, <16 x i32> %6)
  %8 = bitcast <16 x i32> %7 to <8 x i64>
  ret <8 x i64> %8
}

declare <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32>, <16 x i32>, <16 x i32>) #1

declare void @llvm.dbg.value(metadata, metadata, metadata) #2

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "min-legal-vector-width"="512" "no-trapping-math"="true" "stack-protector-buffer-size"="8"
  "target-cpu"="znver4" "target-features"="+adx,+aes,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+crc32,+cx16,+cx8,+evex512,+f16c,+fma,+fsgsbase,+fxsr,+gfni,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
```

# C testcase

Compiler Explorer link: https://godbolt.org/z/z6xxf35hG

Problem: when compiling with `-march=znver4`, the `_mm512_dpwssd_epi32` intrinsic fails to compile to the expected AVX-512-VNNI `vpdpwssd` instruction and instead generates a (`vpmaddwd`, `vpaddd`) fallback implementation.

The problem only reproduces with `-march=znver4` and not with `-march=cascade` or when `-march` is simply omitted and individual AVX-512 features are passed instead, as shown in the Compiler Explorer link.

Inlining the C testcase here for completeness:

```c
#include <immintrin.h>

__m512i foo(__m512i x, __m512i y, __m512i z) {
 return _mm512_dpwssd_epi32(x, y, z);
}
```

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzcV81u4zgSfhr6QtiQqB_LBx8Sp7MIMNtYLAa9cwsosiRxQpEakpKcPP2C-ovsuNON3T4NEDhfUVXFqmKx9IlaK0oFcETJPUoeNrR1lTbH_E_KdL7JNX89PgopVIldJSx-srYF3AtXYddrfNJ1IyQY_OXcSG3AYAfWMWrB4i3WCrBQ-Lffvv0TP_0bU8WxqwBrV4HxD047FDyg4G76JdGiOntZP_-4lxTqBUV3uHKusSi6Q-QRkcdS81xLt9OmROTxDZHH6D9FyIovOly7-5fRuYTa2_cVqCGyhhpQDhetYk5ohStqMUoDRIijpgS3ZU2LCEHRAyLkTXVgYi96jdPgwMM4kLKrd-cs3dHunIRk1zW86a3luyT0ylgoZ4SyguGCCmmx05iNyXno_cC5AeaA47tvf2yTkGy_ff365L3PrkY31pl2jNTX1stAOS5BgaEOLKYYkWywqinnPZ8iHVYo56N8wAWVMqfsBYu6kVCDctQ7vTid3315xpJhreQrNtAYzVsGdmyHn6rTEKfS7gcmjFpGOUj6ApOdNuMp3TLxpbDY-uBfsa6Fc8B9mtRiW-le-VbzRb3dQBdZPikp1Njs8KEZcQUGcKHNcFoSHCiwQ9-tuzgNxj_fBNNSdI9_B-u83zHvz-vk46XOGZG3_hARiQKcg9T9Dn_VrvJuQFrArKKqBItzcD34Dp6v3kVKHAqhAHOrn6VmVGIUnTJ8xiKNUfQFozgotPZtcrGsdKs4FBiRZGyZ7z0NP31KfHsN2z63StEa-DPl3Iw5of39GCL2qjFG0QPOhU_BXQdJksBfDRSdwtQvRwRFX1bGyefG4afG6efG5FPj_WDsqJCYUSmvFPGPpsFQ9wsDksRTSa-Wk9vLKSKHVTzZdTJX6vspm_ccZ2MDN5LPplbaP1z2FJPUwP-b7MeEbq0cfLeEt7bvtODLpjwvdx2VLSCS1eAop456d7fx4JSsnV7fOF9GtL_HdWtdY3RpwFqsdGEAsNIGWGusR_ZVsaHle6E47oWUBlxrFK6h1uYVkUxpBX7Htnc0l-Cvfy3UVkJJ5bYD5rTZ9oK7ahkFQ7GIV1R66wxtGqHKbU1XKs74TAcd6yh72TZGu9FX3hYFmK0Vb7CoZx4tXfL5-Hl_XgB1rQG7KCFyT_kZkZMHfn0A3XkBZEFJSPIiTC9k4agsL1b6tcT4WuJ_raViLYiipmu5y2txLV9E0skLSalL9UY3TLl5w8XZuxcmC9naSjdulvt8Rm9g9IQNi2aD85w6O2cjgA7Ow8F6oQhTNqE5lcKWNvecZ5TO1oyoLOZoheoaJqYiyTempmjqejqAWnf5ZF_3VLhpuWGybqcKNC_tBIacJ2wKVk1HYXgzb2F4Y9oZGrWsWoAJW1pNB2MrOgPrXiY4J2MtkAVFC4p34Qq_a8Szq3ftbum2bszmr_m0-lxpobopoHO2n4ClHawgW-HlGAfJDj0_D7jLMRB-bwywmTItE-Hn58B39iLLXj92bxtgraTjPPmpvRZeck25T7-YbL-l53MRJdU_PiPbI919J0RpsK2pYRWKHqY5dMGpn-s6Ccnz-Cp5hsa_Gv7GPPpjLT7w5kVlospXHHl6fIsZTyly0QneUjlXBs-zHvs3a0OthaUSv4JMn_5HGs2WThWKyZYPpEPU9Xj2u2ohMOPv87PvFIFHTjtLfg7iWXhdC2_jJZmJ6HSNbvUbyQYvg7W3QtH9NTe6vGIbfoz4ITrQDRzDfXDYR2mWHDbV8bDf84DHhyQI9weSZlkWFJynATlEvEj28UYcSUDiIArSME7i8LBLC1oc9kl8oHtCckpQHEBNhdwNzEebciP8d_kxi8OMbCTNQdrhg95rsxdQHEV3f2Tp8AY_IUKGj5PojmkOJSi_nDxszNEvb_O2tJ5UCevs-wZOOAnHi2u0vje-N-f7wpc2XNGMVTNvWiOPV2NEuKrNd0zXiDwOwY3_PKv5E5hD5HHI0CLyOCT53wAAAP__c_sNMQ">