<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/68810>68810</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Regression: no longer generating `vdpbf16ps` with `m32bcst` RHS
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:X86,
            llvm:codegen,
            regression,
            performance
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          bjacob
      </td>
    </tr>
</table>

<pre>
    This is a regression in the performance of the generated code, losing a peephole optimization to target a useful target instruction. Compiler Explorer: https://godbolt.org/z/xsGGsov5W

Summary:

AVX-512 has multiply-accumulate instructions with broadcasted-scalar memory-operand variant for the RHS operand ("m32bcst"). These aren't 1:1 reflected in intrinsics, instead the expectation is that the compiler will turn a sequence of intrinsics (broadcast RHS, then vector-to-vector FMA) into this instruction. This issue is about that peephole optimization no longer happening.

This is a follow-up to recently-fixed Issue #68117 which was about a compiler crash on the same source code. Now that it's fixed, we can test again, but this limits the ability to narrow the regression window on this issue.

Summary:

Clang version | result
--- | ---
Clang 15 | Generates optimal code
Clang 16 | Compiler crash (#68117).
Clang 17 | Compiler crash (#68117).
Clang 18 (trunk after #68117 fixed) | Generates sub-optimal code.

Testcase (see it in Compiler Explorer: https://godbolt.org/z/xsGGsov5W)

```c
#include <immintrin.h>
#include <stdint.h>

static __m512bh bitcast_16xf32_to_32xbf16(__m512 a) {
  return *(const __m512bh*)(&a);
}

__m512 iree_mm512_dpbf16_ps_broadcast_rhs(
 __m512 acc, __m512bh lhs, const uint16_t* rhs) {
  return _mm512_dpbf16_ps(acc, lhs,
 bitcast_16xf32_to_32xbf16(_mm512_set1_ps(
 *(const float*)rhs)));
}
```

Compile with these flags: `-O2 -mavx512f -mavx512bf16`

Sub-optimal result with current trunk (Clang 18):

```asm
iree_mm512_dpbf16_ps_broadcast_rhs:     # @iree_mm512_dpbf16_ps_broadcast_rhs
        vbroadcastss    zmm2, dword ptr [rdi]
        vdpbf16ps       zmm0, zmm1, zmm2
 ret
```

Optimal result with Clang 15:

```asm
iree_mm512_dpbf16_ps_broadcast_rhs:     # @iree_mm512_dpbf16_ps_broadcast_rhs
        vdpbf16ps       zmm0, zmm1, dword ptr [rdi]{1to16}
        ret
```

@RKSimon @alexey-bataev 
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVsGO2zgS_Rr6UpAhUZYsH3xwd8fZxWI3QDrYyc2gqJLFhCI1JGW7--sHJGW3uqeRZDCXMQxbKrEeH189lsisFUeFuCXFHSkeFmx0nTbb-hvjul7UunnafumEBWGBgcGjQWuFViAUuA5hQNNq0zPFEXQbQkdUaJjDBrhukNB7kNoKdQQGA-LQaYmgByd68cych3IaHDNHdMBgtNiO8novlHVm5H7UEu51PwiJBj5cBqkNGpLvoHNusCTfEbondH_UTa2lW2pzJHT_TOj-Yj9-tPpU_EbSB5Lu4u_j2PfMPPm0WXT3_69JkVHomIV-lE4M8ilhnI_9KJnDORkLZ-E6qI1mDWfWYZNYziQz0GOvzVOiBzRMNXBiRjDloNUmaPP5X49wfUZoRSjtc1pz6wilhG6W8KVDi8AMKkLXDjKS7zIw2ErkXlHhhXdGKCu49dJ6UsiaAI6XAbmLmgoLrmMuxPlVuLOQEtxoFDCw-PuIU9VeID2p26o8Wz-H61DBCbnTJnE6iVew_--O0I3P1eCCQ-bFmjxjRwzOqfXoIqH3LaA0SK2OaKBjw4BKqONyXpsXC7ZaSn1OxsHbxiBH5eRT0ooLNvDvMB-heVll2RrOneAdnNl1fvaiBDfMdqCjhy3rEaweDcdg2SX8T58jW-EIXVsI8F6KMwJnChxaB-zIhPLBOqxNWJCiF84GTFYLKdyTJ6mYMQEP5_vnLFSjz5HCVarlz016L5k6wglNACHrezBoR-ni0yRJQixJkvnwrAjRj9PGtFF7JuMGnQ8sw8D71zIFo0ZNvUdfJaz_akLlnzozqu_AWofmpVyTyJs3XO1YJ3O-r32B1nFmfdEriwjC94y_2ynoZj4FKdP45dM9zYXicmwQSH4v-j7unmVH8g_vjbCuEcrNH4df6zcqh8OhLzJad1ALvxB3yMpLm9OD04ecXuo2Kwmt4iBgUZy7CABgMOxlQneEVlwr625wIbYJdSh9GsmnLLJ-mLOYkIVBPPT-8tAMftLDYA-3PnAwnfVQcdorGc6992_8ZRf6UaQxCuWy8uAI3UFIfo_32wkJrSbQiDUN_6EwEcKiyyLAlDOXpJWauahHpDJ9_6zItdCv9lu0Uuz3LjTnVrKjNxKQMk0-UUh6droUGW1vV4Hda5zHmYvjjo2QfDQGlYO4IwitrtskUNy9a0Rm-xj5hbLlO_AfQnMgq_QXEqYaxc_p9sxaf__c99QXqDlr08DgDJDizjSCFA9vEiP8YKf7575PfeJz32fTP50yDLof6P_pHc2uTe0foM9PlvmeTuu7zOmsvNnuivVjHcgq_fyfR9H7pr9KmcQLPiU1cwxPsGi2ebPJN2yB26zcrNO83KyKRbdNmzqr1xVmZdkWjJf5Ks1onrOmqKtNQzcLsaUpzbM0y7IirYpyWdPVuimrdd40bZ3lG7JKsWdCLqU89b5ZLsKLaltWVZYuJKtR2nBypLRm_DuqhuS7r1UZjjP3hFKfR_Kdb9xHf6aZwi8vwltodpL0seJhYbY-O6nHoyWrVArr7AsPJ5zE7ecXnHw3O0VMZ1B_6iRleisTKdNoIVKm14NXmfpzzmI0cvvmFSFcN9ZLrntC92Ed8S8ZjP6G3BG6D2JYQvdBjz8CAAD___81gvo">