<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/54651>54651</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Improve vector multiply by constant
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ilinpv
      </td>
    </tr>
</table>

<pre>
    Multiplications by a constant can be synthesised using addition, subtraction and shifts to avoid expensive multiplication instructions. This can also be done on the vector side. LLVM can be improved a bit here.

```
#define N 256
 void
foo (unsigned long *arr)
{
  for (int i = 0; i < N; i++)
    arr[i] *= 5;
}
```
GCC at -O3 produces:
```
foo(unsigned long*):
        add     x1, x0, 2048
.L2:
        ldr     q1, [x0]
        shl     v0.2d, v1.2d, 2
        add     v0.2d, v0.2d, v1.2d
        str     q0, [x0], 16
        cmp     x1, x0
        bne     .L2
        ret
```
LLVM generates more confusing code:
```
foo(unsigned long*):                               // @foo(unsigned long*)
        mov     x8, xzr
.LBB0_1:                                // =>This Inner Loop Header: Depth=1
        add     x9, x0, x8
        ldp     q0, q1, [x9]
        add     x8, x8, #32                     // =32
        cmp     x8, #2048                       // =2048
        fmov    x11, d0
        mov     x10, v0.d[1]
        add     x11, x11, x11, lsl #2
        fmov    x12, d1
        add     x12, x12, x12, lsl #2
        mov     x13, v1.d[1]
        fmov    d0, x11
        add     x10, x10, x10, lsl #2
        fmov    d1, x12
        mov     v0.d[1], x10
        add     x10, x13, x13, lsl #2
        mov     v1.d[1], x10
        stp     q0, q1, [x9]
        b.ne    .LBB0_1
        ret
```
Those FMOVs from SIMD -> GP regs will be expensive and the ADD+LSL operation can be done on the SIMD side, albeit with two instructions.
https://godbolt.org/z/axb1zrq5a
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJydVlGPmzgQ_jXkZbQR2JCEhzwkm9veStn2pFZ9rQx2gk8OprbJZvfX39hAE7bJbnWIYBPPfN839thDofnL8qlVTjZKlsxJXVsoXoBBiT3Hagclq6EQYF9qVwkrreDQWlnvgXEuvUNE7sG2hTOs9K_Aag62kjtnwWlgRy05iFMjaiuPAg4jMpDIYtrgaKfwrZI2EDJltWfluhaAZkgNR1E6bcBKLqaw3X5_GqTJQ2P0EXUxKKSDShgxjeJNFK_65yzu7-6VUC52EoE_A8lm3Z_gZXbdndYQkUWLevc1oiqNwUZkxYyJSN5jzNe9H-xQE5pLnCoJEd1AHNF16N7D59CNyDrc-eAC4LGytYyyjUf2XhmaDtibq7o_3d8Dc3D3hQLGy9tS2IheDxFDeBuBp0EFgwP0Fy5iaE-JX8ZT7J8kThed1XRLfvNQ3IT2Z_DAKNAp24xtbKVCe4ynhHuzY9J3yHX6s-HYY4zqeuZ4xIz9ZDa2LA_NOKrRaIFL7y8f3WjACHd1OkOy7UUtDHPCwkEb4ffHrtsHpebi_ywEvH9F5AFviNL4NshI_UEfu6AXIehXM6zheh3_SD4m_MVINxH9K2zFxxpjhq3WDfwtGBfGo2xE4yo0Sm5kUn7OpNPibe40Fyt4zqD8twz6hbbocbwloZR8oJySG5kwIPjk_jD-8w4Yxnb97J6SIJrHN-Y-ifss5hhXcjusDmbcKKuCwlvMJDDfmvZueNxcRzyLpf1Ouy52YObxIPIGcz9-2bwfC08GkVeFXU5fD_k-M71o3o_5Mthr0Nb9aYYW0-4YGfbXH50k3yptBTw8ffluYWf0Ab4-Pm3gDvcbfPoHvfYWnqVSvqqdS6YvqL4Arjaoeb39ugXd-KPI18--BF7WyQDpq6TXzlQhsCY-S1eBe9bjettpqpxrQiUJ6b_XvNDKTbXBQ-bhFX_sVCSv5mfGJnxJeU5zNnHSKbHEeVmtTFnNUl_HHrsqPJTpvs6_-K-J4Vti0hq1fEOHwtpiWuoDvih1HJo7BPsXkfBVWttiqSMPWTrLkkm1pPNUlEU5yymlCcsXZbwQdMbyJGVlMs_iiWKFUNbriwipxTMECOyjzolckpiQmNI4mZM5wXxgNE1JMSMJT2dpnuGJKw5MqqnX4edhYpZBUtHuLQ4qaZ09DzIbzuUwHR6fta7SZimVrJvjJDAvg_L_AELFed4">