<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/71517>71517</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] Missed vectorisation opportunity (tsvc, s172)
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AArch64,
vectorization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
sjoerdmeijer
</td>
</tr>
</table>
<pre>
We are not vectorising kernel s172 from TSVS and are 3x behind compared to GCC as a result. Compile this input with `-O3 -ffast-math -mcpu=neoverse-v2`:
```
__attribute__((aligned(64))) float x[32000];
__attribute__((aligned(64))) float a[32000],b[32000],c[32000],d[32000],e[32000],
aa[256][256],bb[256][256],cc[256][256],tt[256][256];
int dummy(float[32000], float[32000], float[32000], float[32000], float[32000], float[256][256], float[256][256], float[256][256], float);
float s172(int xa, int xb)
{
int n1 = xa;
int n3 = xb;
for (int nl = 0; nl < 100000; nl++) {
for (int i = n1-1; i < 32000; i += n3) {
a[i] += b[i];
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}
}
```
Clang's codegen:
```
.LBB0_3: // Parent Loop BB0_2 Depth=1
ldr s0, [x19, x8, lsl #2]
ldr s1, [x20, x8, lsl #2]
fadd s0, s1, s0
str s0, [x20, x8, lsl #2]
add x8, x8, x22
cmp x8, x23
b.lt .LBB0_3
```
GCC's codegen:
```
whilelo p7.s, wzr, w28
.L5:
ld1w z31.s, p7/z, [x19, x0, lsl 2]
ld1w z30.s, p7/z, [x27, x0, lsl 2]
fadd z31.s, z31.s, z30.s
st1w z31.s, p7, [x19, x0, lsl 2]
add x0, x0, x2
whilelo p7.s, w0, w28
b.any .L5
```
See also:
https://godbolt.org/z/W8eEPKqET
TODO: root cause analysis.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0Vt9vo7gT_2ucl1EiexwCPOShCZt9-O5XXanV7WNlwAnuGszZpk37158M-UWave3pdBYae359xjODwcI5tWukXJJoRaJsIjpfGbt0z0baspbqWdpJbsq35Q8JwkpojIcXWXhjlVPNDn5K20gNjsUIW2tqeHz44wFEU_bWfA-5rFRTQmHqVlhZgjfwdb0G4UCAla7TfgZrU7dKS_CVcqCatvPwqnwFZEGn9xym261wfloLX8G0LtqO8KyR5kVaJ6cvSBaU8DtCM0KPdEEPT88-PQnvrco7L5-eCCYEE6FD1iXBZDEnmA4PbLURHvYkWnGklJIoI3x1CfwPkcQlEq7zMVuM2XLMyjE7xAcRIDFaBOFpges8vykuipti72-Ix4mqxkPZ1fUbwaTPZbwb-K9kHzf7r1SYXuU19CW8rQSTkOReBOt-lQfzwTY-OAFAr2sYEJ4FY36t4YMmvwoEW2PhEKPRvQ0lfDWs18BoGIOA4Kp_UhjFDeMCRPUYDZuy4KV6lKGCA4urXs9v4oQR3hxFouxomh_4UUphkDg7CI5vQF-jPJAikDIQGYgYNIOq19HZZdFHcOfF-HgOdK1FsyMYOyhMKXey-fszPfu2WtEnTvgox1uD4IbgBgC-CysbD9-MaSE4I2Sy9RXhGRsXQJe2nx0NGZFotWdpWO2TQLXTQJBjKN0HN8eOLkg_4bIVZXmONDg7OrZx_no3n4I-IB_MDhRxbFTU7aUR8rE6n2kf5mOxf929r-v153sHr5XSUhto45kLgV_fbT9hcmxudEI4V5e9hvmds8GpjQlu3q86RI81udWeIwC9BYDx7wD6Zp3CXywC3lXLbuz1t_s8dYyeTfZXDftQOToq3LlxonmDUMZft-xBShDamXCEBknlfetC4fsTszNlbrSfGbsb6rT5kcgv3__355fHS5jH--w-QFhjPBSicxJEI_SbU242KZe8THkqJnLJFmnKo4jTaFIty4SJebTgUiSCpTxJE5wX20jGuN2yVOBELZEiZ4zGjNEEoxlHxhe8lPNkzudblpM5lbVQeqb1Sx32OFHOdXIZs4jFEy1yqV1_p0HMRfFTNiXhd3d3tqjCbxrDHxXxcJF5F16ZJkijbGKXAXGadztH5lQr5905hlde91elI1KUwf-Vc7I8XYp6LDBta6zvGuXfwgfcu5diOOHhx5NOOquXV9VWvuryWWFqgpsQ7zBNW2ueZeEJbvoEHcFNn-NfAQAA__8KoYBs">