<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/54651>54651</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] Improve vector multiply by constant
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
ilinpv
</td>
</tr>
</table>
<pre>
Multiplications by a constant can be synthesised using addition, subtraction and shifts to avoid expensive multiplication instructions. This can also be done on the vector side. LLVM can be improved a bit here.
```
#define N 256
void
foo (unsigned long *arr)
{
for (int i = 0; i < N; i++)
arr[i] *= 5;
}
```
GCC at -O3 produces:
```
foo(unsigned long*):
add x1, x0, 2048
.L2:
ldr q1, [x0]
shl v0.2d, v1.2d, 2
add v0.2d, v0.2d, v1.2d
str q0, [x0], 16
cmp x1, x0
bne .L2
ret
```
LLVM generates more confusing code:
```
foo(unsigned long*): // @foo(unsigned long*)
mov x8, xzr
.LBB0_1: // =>This Inner Loop Header: Depth=1
add x9, x0, x8
ldp q0, q1, [x9]
add x8, x8, #32 // =32
cmp x8, #2048 // =2048
fmov x11, d0
mov x10, v0.d[1]
add x11, x11, x11, lsl #2
fmov x12, d1
add x12, x12, x12, lsl #2
mov x13, v1.d[1]
fmov d0, x11
add x10, x10, x10, lsl #2
fmov d1, x12
mov v0.d[1], x10
add x10, x13, x13, lsl #2
mov v1.d[1], x10
stp q0, q1, [x9]
b.ne .LBB0_1
ret
```
Those FMOVs from SIMD -> GP regs will be expensive and the ADD+LSL operation can be done on the SIMD side, albeit with two instructions.
https://godbolt.org/z/axb1zrq5a
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJydVlGPmzgQ_jXkZbQR2JCEhzwkm9veStn2pFZ9rQx2gk8OprbJZvfX39hAE7bJbnWIYBPPfN839thDofnL8qlVTjZKlsxJXVsoXoBBiT3Hagclq6EQYF9qVwkrreDQWlnvgXEuvUNE7sG2hTOs9K_Aag62kjtnwWlgRy05iFMjaiuPAg4jMpDIYtrgaKfwrZI2EDJltWfluhaAZkgNR1E6bcBKLqaw3X5_GqTJQ2P0EXUxKKSDShgxjeJNFK_65yzu7-6VUC52EoE_A8lm3Z_gZXbdndYQkUWLevc1oiqNwUZkxYyJSN5jzNe9H-xQE5pLnCoJEd1AHNF16N7D59CNyDrc-eAC4LGytYyyjUf2XhmaDtibq7o_3d8Dc3D3hQLGy9tS2IheDxFDeBuBp0EFgwP0Fy5iaE-JX8ZT7J8kThed1XRLfvNQ3IT2Z_DAKNAp24xtbKVCe4ynhHuzY9J3yHX6s-HYY4zqeuZ4xIz9ZDa2LA_NOKrRaIFL7y8f3WjACHd1OkOy7UUtDHPCwkEb4ffHrtsHpebi_ywEvH9F5AFviNL4NshI_UEfu6AXIehXM6zheh3_SD4m_MVINxH9K2zFxxpjhq3WDfwtGBfGo2xE4yo0Sm5kUn7OpNPibe40Fyt4zqD8twz6hbbocbwloZR8oJySG5kwIPjk_jD-8w4Yxnb97J6SIJrHN-Y-ifss5hhXcjusDmbcKKuCwlvMJDDfmvZueNxcRzyLpf1Ouy52YObxIPIGcz9-2bwfC08GkVeFXU5fD_k-M71o3o_5Mthr0Nb9aYYW0-4YGfbXH50k3yptBTw8ffluYWf0Ab4-Pm3gDvcbfPoHvfYWnqVSvqqdS6YvqL4Arjaoeb39ugXd-KPI18--BF7WyQDpq6TXzlQhsCY-S1eBe9bjettpqpxrQiUJ6b_XvNDKTbXBQ-bhFX_sVCSv5mfGJnxJeU5zNnHSKbHEeVmtTFnNUl_HHrsqPJTpvs6_-K-J4Vti0hq1fEOHwtpiWuoDvih1HJo7BPsXkfBVWttiqSMPWTrLkkm1pPNUlEU5yymlCcsXZbwQdMbyJGVlMs_iiWKFUNbriwipxTMECOyjzolckpiQmNI4mZM5wXxgNE1JMSMJT2dpnuGJKw5MqqnX4edhYpZBUtHuLQ4qaZ09DzIbzuUwHR6fta7SZimVrJvjJDAvg_L_AELFed4">