<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/81391>81391</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Excessive moving of SIMD registers on x86-64
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          dzaima
      </td>
    </tr>
</table>

<pre>
    Apologies for the unreadable autogenerated C code, but hopefully that doesn't matter much for the issue in question.

The code [here](https://godbolt.org/z/vn3EP6391) generates assembly which contains within it this excerpt:

```asm
 vmovdqu ymmword ptr [rsp + 112], ymm2   # 32-byte Spill
        vmovdqa ymm2, ymm14
        vmovdqa xmm14, xmm1
        vmovdqa ymm1, ymm9
 vmovdqa ymm9, ymm8
        vmovdqa ymm8, ymm6
        vmovdqa ymm6, ymm5
        vmovdqa ymm5, ymm3
        vmovdqa xmm3, xmm12
 vmovq   xmm12, qword ptr [rbx + rax + 8] # xmm12 = mem[0],zero
 vmovdqa ymm4, ymm15
        vmovdqa ymm15, ymm13
        vmovd xmm13, edx
        vpinsrd xmm13, xmm13, ecx, 1
        vpmaxsd xmm12, xmm12, xmm13
        vmovq   xmm13, qword ptr [rbx + rax]    # xmm13 = mem[0],zero
        vpbroadcastd    xmm0, dword ptr [rip + .LCPI0_17] # xmm0 = [1,1,1,1]
        vpinsrd xmm0, xmm0, ecx, 0
        vpaddd xmm0, xmm13, xmm0
        vmovdqa ymm13, ymm15
        vmovdqa ymm15, ymm4
        vpunpcklqdq     xmm0, xmm0, xmm12       # xmm0 = xmm0[0],xmm12[0]
        vmovdqa xmm12, xmm3
        vmovdqa ymm3, ymm5
        vmovdqa ymm5, ymm6
        vmovdqa ymm6, ymm8
 vmovdqa ymm8, ymm9
        vmovdqa ymm9, ymm1
        vmovdqa xmm1, xmm14
        vmovdqa ymm14, ymm2
        vmovdqu ymm2, ymmword ptr [rsp + 112]   # 32-byte Reload
```

That's 9 registers moved forwards and then later back, none of which are used in the code between (and no there are no jumps to the middle of this).

The C code does include an `__asm__` mutating a `__m256i` (for the purpose of preventing merging simple shuffles into `vpermd`s to reduce register pressure), but that's in a different place and is assigned to `ymm7` which does not feature in the problematic excerpt (`vmovapd` is added within `OFENCE_V` just to demonstrate this). Nevertheless, replacing `OFENCE_V` with `#define OFENCE_V(X) X` gets rid of the problem (perhaps by chance).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVl1v27gS_TX0yyCGRFmy9OCHxo6BAvf2FreLRd8CShxbbPmhkJRj99cvSEmO7cTd3SCJbM7hzJlDzQyZc2KvEVckfyT5ZsZ63xq74r-YUGxWG35afeqMNHuBDnbGgm8Rem2RcVZLBNZ7s0eNlnnksIbGcCR0DXXvoTUd7nopT-Bb5oEbdJrQpQfFvEcLqm_as0_hXI8gNLz06Lwwek6SDUk-Df__aDG6BpI_tmiR5BtCy9b7zpHsE6FbQrd7w2sj_dzYPaHbX4RuDzp7-lpkVUpoBRNLB8w5VLU8wWsrmhYaoz0T2sGr8K3QIDz4VjjAY4O288H_BRNSJMMvc2pYgYMyB_7Sw0mpV2M5dN4GntZ1QOgjpCmNdNcBQAGA0Awy-lCfPMK3Tkg5-hl_Bncsosdd6eJjyDGa6Dp-uOslHb1UV3zZsDSYyrubyxFR3EUUIyK_i8hHRHY3i2xKgl5wfAEY1-gaXq6krY9RWsuGZ0nyTZQ1woFkG1CoSP6YDMr_QmveJ7-Y1L3PPJ2opx9xj-EideTHG3sntLMXiDdocwyP2-PqFDs6_pbv1YcPgk_iZL8TJ8gyvnAR-1tlzkxqaxhvmPMchiBJiMGvYojh3Z7_Z_31c_KcLi8OIIlRSP4Y3ru3v3xzV6FkzDO50Ce5RTPOr8BnWW-RF6eX_YsTvi2xrtdd81O-8Be4lOGSAR2xV4lHwFnf4RjHr3dreDroOwVyGgvkHxXZ3xdq-b4UypsW8X7z1CnutJnj2GaO95vVaWpWsbF9AOkvW979TnrTQf-P0jB-05yvZwfzhC4dVGBxL5xH60CZA_IwfF6Z5Q6Y5mEKaZAsTKaaNT8DD200gtmNg4JZhN4hD1PKTwOpRv-KqIHQMjjRJpgsRrA28KNXnQMfV0EJzmV0GAYModW7GTcM0DgqQehG9hyBaSBF8vzMnHp-JkUCqvfMC70HNhgUzQsRDISW0zjtetsZF2N1Fg-o4waFdh-eTqhOIri23-1kDOVN8HXo0CpOiiRStsj7Bs-iBUfO9RYJraYR7ydthQYGXOx2aFF76CRrMKoq4rwNdwwOQ5CTUsvAdhA1pqqNhx0y31ucxO2sqSUq5kUzTeKQXyCpzIF1gWV0zjnyaXSTIvnf9unL-un5z2D-0TsfgnJURjsfpv9ZefiCB7S-RYnOhXQsBtJBnRsvwXdYIzTjuBMa4Wyl5fdwtfgeYHv0Dqzgw_GeEwikO7Qt6xzUJ2happug4HzGVxmvsorNcJUuk2WRFouimrWrbLljCeNJzrOMlglNyiphKSuXrEqrosKZWNGELhKapkmaL2g1z4sF4wkrd7TMSrrMyCJBxYScS3lQ4T40i7erVZlmVTqTrEbp4oWPUo2vw9WL0FBbM7sKex7qfu_IIpHCeffmxQsvcfV0bNA5ccBQREEvs4Nvn_-7uaguo-FYFg_FYtZbubq5pgnf9vW8MYrQbXA9Ph46a35g4wndRkKO0G0k_FcAAAD__6AnNHU">