<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/55452>55452</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Unnecessary round-trip through GPRs after AdvSIMD reduction ops
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          rygorous
      </td>
    </tr>
</table>

<pre>
    The ARM intrinsics for the vector reduction ops have a scalar return type, but the actual result is left in SIMD regs, with awkward consequences when these results are actually meant to then feed into further SIMD computations.

Generally, this results in several extra instructions to shuffle the result to GPRs and back, and a corresponding substantial increase in latency. Simple example reproducer for Clang 14.0.0 targeting AArch64 here: https://godbolt.org/z/MoeY4sYob

Ideally, the `fmov` instruction should disappear entirely and the `dup` and `mov`/`ins` should use the corresponding vector register as source.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJxdU8Fu2zAM_RrnQtRw7ThNDj5kLVr0UGBot8OOtETbWhXJE6Wk2dePStK1G2CbEik9vkfSvdfH7ttEsH1-AuNiMI6NYhh8gCjuPakoy0A6qWi8Az8zTLgnQGCFFnMspuAgHmcq6lvoUzzdRBUTWolyshEMg6VBrIOXx6c7cY-cTx9MnAAPrwcMGpR3TL8SOUUMh4lcBmK6YDBgeIe1R9gROsnk8xkHA5HO_D0MKYgnnNMov5tTxMycy6K6K6rt-ftAjkLGySTiJPTekwhDpn0OAr3FgOLgGM7qOefjKQ2DpZPIizrxPnx9FoJOQ4_qNYPmNQqBIGdm77RxI3DqOQptI-DGqUAo6iShxSiijyW8mN0s0PSGJxtoDl4qL3JyQ24tCsj1sqzKCiKGkWJG3W6DmlZLENVUNFuYYpxZFkV9L8_ode9tLH0YZfdb3idPP5b8w_efC_Ko6aMcBMWqGnZ-L-azftHuk9WgDeM8kzSfREwgaUeWe7mo05zvZY_YM0rmsqoEKkcuKInPRfy3Rn8nbjQcRTgysE9BUbnQXaM3zQYX0URLXdF-uUgv2jv47hzJ3DCGIwSfnL6SYZ4lgWzG6dKfISNu9f4yg5-GepGC7f6rnMxm6kuZIdlYu383V9KUn8JStoY5kQzyfdsu23oxdf2wvl71dY9N06t12zarzWa46ZXqV2rZEi4s9mQ5cy_q2tEBThCyFg0L09VVXVftdVPXTbNcl4SImoYbNdSrtdZNsaxoh8aWmUdu6SJ0J0p9kv9pWVmpGX8EkdmMjk6lyviY4uRDF46jl6rw4pS7O3H_A9MtY5g">