<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/71511>71511</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] Missed vectorisation opportunity (tsvc, s112)
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AArch64,
vectorization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
sjoerdmeijer
</td>
</tr>
</table>
<pre>
With Clang top of tree, we are about 35% behind on our AArch64 platform compared to GCC12 for the s122 kernel from TSVC: GCC vectorises the kernel, Clang doesn't. Clang seems to think it's not worthwhile vectorising this input with -O3 -ffast-math -mcpu=neoverse-v2:
```
__attribute__((aligned(64))) float x[32000];
__attribute__((aligned(64))) float a[32000],b[32000],c[32000],d[32000],e[32000],
aa[256][256],bb[256][256],cc[256][256],tt[256][256];
int dummy(float[32000], float[32000], float[32000], float[32000], float[32000], float[256][256], float[256][256], float[256][256], float);
float s112()
{
for (int nl = 0; nl < 3*100000; nl++) {
for (int i = 32000 - 2; i >= 0; i--) {
a[i+1] = a[i] + b[i];
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}
}
```
Clang's codegen:
```
.LBB0_3: // Parent Loop BB0_2 Depth=1
add x10, x19, x8
ldr s0, [x9, #-8]!
sub x8, x8, #8
ldur s1, [x10, #-4]
fadd s2, s1, s0
ldur s0, [x9, #-4]
ldr s1, [x19, x8]
fadd s0, s1, s0
stp s0, s2, [x9]
cbnz x8, .LBB0_3
```
whereas GCC generates:
```
.L4:
ldr q30, [x27, x0]
ldr q28, [x20, x0]
mov v31.16b, v30.16b
mov v29.16b, v28.16b
tbl v30.16b, {v30.16b - v31.16b}, v27.16b
tbl v28.16b, {v28.16b - v29.16b}, v27.16b
fadd v30.4s, v30.4s, v28.4s
mov v31.16b, v30.16b
tbl v30.16b, {v30.16b - v31.16b}, v27.16b
str q30, [x19, x0]
sub x0, x0, #16
cmp x0, x28
bne .L4
```
See https://godbolt.org/z/6cfK9sbj5 for the reproducer (and this codegen).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0Vl-PozYQ_zTOyygRjEOAhzxskqYPvaqVrmofVwZPgvcAc7bJ7u2nr2wImz-7V12rImR7xsNv5jceGwtr1bElWrNkw5LdTPSu0mZtnzQZ2ZB6IjMrtPy2_ku5Cra1aI_gdAf6AM4QMdzCM4EwBKLQvQOeMEygoEq1EnQLujfw8GDKarWErhbuoE0DpW46YUiC0_DzdhsjHLQBVxHYGBG-kGmphoPRDfzx-c8t4w_eDE5UOm2UJRtsBzMfwRCW1GRbhqlbjApL1Fjvw1Wq_QLKMUwttNrBszaueq5UTROo8sQqZUG1Xe_g2dOd_8ZhfjgI6-aN8HJTdj3ju5b0iYyl-QkZf2DRjkXndhWNbxAfH4VzRhW9o8dHhhnDTNQ-35JhtloyzIcXDrUWDl5YsuEYRRFLdoxvLoF_EElcIuG2uBbLa1Fei3QtDv5BeEhMVl45DXBbFO-qy_JdtXPvqK-JqtaB7JvmG8MscLmOBv4v3X2w_2kK8xtew7rYOMawevk4l45GABA2AcPMZ6CtgfEdRIxvhvEWOMOHOPLPoGS4CW8OVyA3QCrgBMIwB_SfetVPE7qaz9_F8I9fcsVwE7NkF3AGhRdwA8UoTESBpbtrkPNCCp-Ywjelb6RvwtkhhplhKsxFi8vcvUG-Da632NCGDR-2d6klHan9_r5cfNpsokfuDxaGe4Z7APhdGGodfNK6Az-LsKPOVYzv4mtSQsrQv8SRD_glzkOXXVvV0oTeBiOWbF6CGUM-z0Kl3KDavhhQsxFtMD6j1rIPeDY-ww3ePd7S410XwBijRW8zfGOj2wBHxPsA7wEnNm_uz6w_9B2949u67iIrQ3iD61uYsmhf3_JxXrCPK-C5IkPChh_FkVoywpH9pypYTgYTw698ygemgWL0YTa-YjbZRh_YNvoEJx4v4lUo8hOPwvDOxj8nzCc7zO7tXFFPAN5vuhklmE8-0t3wefr-54Ob7BJikDzE6P47EOfF9X6X9kxoHGHmRz_Mfgrr3zOz7m7xxvq8X5Bpo51XbCj6eDWalU13OY83-7poKfS-eD6uxs9EUDnXhQoMJ8xRy0LXbqHNkeH-leF-VR5-yW3xlEy3H0Od0bIvKRzfopXDjeR8pGG-uPQxk2suc56LGa3jVZ7zZLlM-axax1zmZbxaxYdM0hIzQVikaZEkpRRFktJMrTFCHsdRGkfJkvNFJIXA1SrNsTjk2WHJlhE1QtWLuj41PuSZsrandRoncTyrRUG1DTdGxEKUX6iVjD-M1zyGflczxPFm9Sqc0q3XJruZWXvEedEfLVtGtbLOvvlwytXhInpGSnYAvyprSU7XtAAGuuu0cX2r3DefKWdP5XDY-L9rPutNvb7JvnJVXyxK3TDce4djN--MfqLSMdwHhpbhPpD8OwAA__9mENqN">