<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/81112>81112</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64][SVE] Cannot be vectorized, but GCC can vectorize.(TSVC s235)
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
m-saito-fj
</td>
</tr>
</table>
<pre>
Clang cannot SVE vectorize TSVC s235, but GCC13.2.0 can.
Option:
`-Ofast -march=armv8.2-a+sve`
```c
#define LEN 32000
#define LEN2 256
static int ntimes = 200000;
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2], dd[LEN2][LEN2];
int dummy(float[LEN], float[LEN], float[LEN], float[LEN], float[LEN],
float[LEN2][LEN2], float[LEN2][LEN2], float[LEN2][LEN2], float);
int s235()
{
for (int nl = 0; nl < 200*(ntimes/LEN2); nl++) {
for (int i = 0; i < LEN2; i++) {
a[i] += b[i] * c[i];
for (int j = 1; j < LEN2; j++) {
aa[j][i] = aa[j-1][i] + bb[j][i] * a[i];
}
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}
return 0;
}
```
See also (Clang vs GCC):
https://godbolt.org/z/KeeK58oz7
GCC result:
```asm
.L4:
add x8, x9, 1024
mov x0, 0
lsl x11, x12, 2
add x13, x20, x11
add x15, x1, x11
ld1w z1.s, p0/z, [x13]
ld1w z0.s, p0/z, [x15]
add x11, x2, x11
ld1w z2.s, p0/z, [x11]
fmad z2.s, p1/m, z0.s, z1.s
st1w z2.s, p0, [x13]
.L3:
ld1w z1.s, p0/z, [x9, x0, lsl 2]
ld1w z0.s, p0/z, [x10, x0, lsl 2]
fmad z0.s, p1/m, z2.s, z1.s
st1w z0.s, p0, [x8, x0, lsl 2]
add x0, x0, 256
cmp x0, x19
bne .L3
add x12, x12, x16
add x9, x9, x21
add x10, x10, x21
whilelo p0.s, w12, w14
b.any .L4
```
Regarding this result, it appears to me that it is vectorized for i in the outer loop.
`-mllvm -debug-only=loop-vectorize` messages:
```
LV: Checking a loop in 's235' from s235.c:19:4
LV: Loop hints: force=? width=vscale x 0 interleave=0
LV: Found a loop: for.body13
LV: Not vectorizing: Found an unidentified PHI %3 = phi float [ %.pre, %for.body4 ], [ %add25, %for.body13 ], !dbg !30
LV: Interleaving disabled by the pass manager
LV: Can't vectorize the instructions or CFG
```
LLVM does not appear to be able to account for vectorization of loops of the form s235.c.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysV1uP4joS_jXmpUTkOAmEBx4gNLOj6Z1Zba_63YkdcG8So9ihL7_-qOwACZc5RzonQvhSl6_KVbbL3Bi1a6RckmRNks2Ed3av22U9NVxZPS3fJrkWn8us4s0OCt402sLL6xMcZWF1q74k_O_lNQPDooSwDPLOwrcsC6OABRT5A0I3hK78_6-DVbohUT8kMzr9VXJjYVrzttiTaMPb-pgGbMoJW5ujJDM6lMeh-xX9mEVClqqR8Pz0EyJGKb1DYMCSmZ83lltVgGosNFbV0gCJNoBylJJoPcQqK80tcJKsn59-kmTjvBuNitFIjEbyMjqp7RX2GhmSLj1Unj-iFMUjihB3KWNP0FvR1fUnYakzYmTpPzTjoeD8DRhuzf67RLa446JPwRSJnjJfj40qdQuEpS74lQs8xtz3M0wCwlaEpT4xCNs6WAcFTUXY2v0WcKP3jn51Ua-cdqcLR39Bz-nDTFEk2QAKRBuXff145ZJPDWM9hH9z8CECvo3g3-7Bu5R880vt9UebfnIaDqfZ2ifpiJetzoZebHnwkfmmZ7j0rr5TonK_39w2c7vLbSqWoWF-s_iNwTKgwTAhblW30nZtA4MNfuI4nyfDXHqREnhlNC6mP_aOBs80B9Lz7K09GByxLWHbnRa5rmyg2x1h2y_Ctj-k_JGk-ms-VPwty6CVpqvs8AT0P25qPxM8x2fqORGEcO1Hiu5-LPA_pCwec9X6CB_ULciYUJnKi4ehkw8ZNuwBRhg5JkY9b3hKkjM98YQx-QQlwndsv8LAIMOB-gXJgCRrVJ1s7gh80bvcyQ33xQgPz35vBLurNrxRW9ZcjARCwrY19k6WOX9GMsai4QOEGxeD5-gmkr9fHxdYH0KMGbsYepa7v1L0oeC1h_TGQ_ZbD68wPV76Z3DnOA0MO9_BJ6aiPgyZwsWYnDfStbiMD5KADdL5I5w9YFtcNs0HCx_p6o2gY7b3vapkpeHQr8G7B3sPr_ZeHvDm0xkb3z1Z_it3vBWq2YHdK3M6BVgGygI_HCRvDVgNtQS75xZnlbmUWMKd7QpUA3YvQXdWtlBpfQiuiqNpXVXHGqZC5t1uqpvqk0QbZJyedZEZhVoaw3fS3B5Dfvj8SqIVZHtZ_B9N5g4L0Qmb-1t2DmWra3flBgWJViEejfFQ-hlF9qqxiIL2F5JEGxJt4V0Ji6Xe0RS8kvABFOsx2VaSH5FnZMRWd43oLegVBViQhtGQ66e259VSzW4g2EDXKCEbq0olBfznX9_xGmJJ5C65w175igLTGqeDQ-suGcKSE1IMfenRs3AhWHLFE0ZnJhaKfIdNNPLj-8lDXFChDM8rKSD_dAE9cGOg5g3fyXYUAd4QNreDWhu5VWNs2xVYSRvQLWTbb_ej-Pz6bxBaGsCa3WcZJlkuAdGxy4tCd4116XUC4agYdOmW3GAHQUvdnsIdwEQsI7GIFnwil-GczqM0XszpZL9MWJqURVrGSVpE8YzNi0JwxpMyDedChOlELRllMWU0pYtoFtIgKWZzFsl4VpaLSM5iElNZc1UFmMh4oU6UMZ1cpmEYsknFc1kZ91RhrJHv4IiE4Qk0aZcoM827nSExrZSx5qLFKlu5N85q1Rb7WewrmJfXJ6xhMv-syeVgyw0eM_iMuVACwtLBk2cx6dpqeVUPKLvv8qDQNWFbNKFvpodWv8nCErZ1hmOR6Rz7IwAA__-MG4fQ">