<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/80139>80139</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64][SVE] Cannot be vectorized because the dependent distance is not constant (TSVC, s114)
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
m-saito-fj
</td>
</tr>
</table>
<pre>
Clang cannot SVE vectorize TSVC s114, but GCC13.2.0 can.
Option:
`-Ofast -march=armv8.2-a+sve`
```c
#define LEN 32000
#define LEN2 256
static int ntimes = 200000;
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2], dd[LEN2][LEN2];
int dummy(float[LEN], float[LEN], float[LEN], float[LEN], float[LEN],
float[LEN2][LEN2], float[LEN2][LEN2], float[LEN2][LEN2], float);
int s114()
{
for (int nl = 0; nl < 200*(ntimes/(LEN2)); nl++) {
for (int i = 0; i < LEN2; i++) {
for (int j = 0; j < i; j++) {
aa[i][j] = aa[j][i] + bb[i][j];
}
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}
return 0;
}
```
See also (Clang vs GCC):
https://godbolt.org/z/frcoG3xz5
for.body IR:
```llvm
for.body8: ; preds = %for.body8.preheader, %for.body8
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body8 ], [ 0, %for.body8.preheader ]
%arrayidx10 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv, i64 %indvars.iv40, !dbg !27
%0 = load float, ptr %arrayidx10, align 4, !dbg !27, !tbaa !28
%arrayidx14 = getelementptr inbounds [256 x [256 x float]], ptr @bb, i64 0, i64 %indvars.iv40, i64 %indvars.iv, !dbg !32
%1 = load float, ptr %arrayidx14, align 4, !dbg !32, !tbaa !28
%add = fadd fast float %1, %0, !dbg !33
%arrayidx18 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv40, i64 %indvars.iv, !dbg !34
store float %add, ptr %arrayidx18, align 4, !dbg !35, !tbaa !28
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !36
%exitcond.not = icmp eq i64 %indvars.iv.next, %indvars.iv40, !dbg !15
br i1 %exitcond.not, label %for.cond.cleanup7.loopexit, label %for.body8, !dbg !16, !llvm.loop !37
```
`-mllvm -debug-only=loop-accesses` messages:
```
LAA: Found a runtime check ptr: %arrayidx18 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv40, i64 %indvars.iv, !dbg !34
LAA: Found a runtime check ptr: %arrayidx10 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv, i64 %indvars.iv40, !dbg !27
LAA: We need to do 0 pointer comparisons.
LAA: May be able to perform a memory runtime check if needed.
LAA: Checking memory dependencies
LAA: Src Scev: {{@aa,+,4}<nw><%for.cond5.preheader>,+,1024}<nuw><%for.body8>Sink Scev: {{@aa,+,1024}<nw><%for.cond5.preheader>,+,4}<nuw><%for.body8>(Induction step: 256)
LAA: Distance for %0 = load float, ptr %arrayidx10, align 4, !dbg !27, !tbaa !28 to store float %add, ptr %arrayidx18, align 4, !dbg !35, !tbaa !28: {{0,+,1020}<%for.cond5.preheader>,+,-1020}<%for.body8>
Pointer access with non-constant stride
Total Dependences: 1
LAA: unsafe dependent memory operations in loop
```
The direct factor seems to be that the distance between the load of a[j][i] and the store of a[i][j] is not constant.
On the other hand, GCC13 is able to vectorize by vector masked load/store.
```asm
.L4:
ld1w z31.s, p7/z, [x8, z29.s, uxtw]
ld1w z30.s, p7/z, [x11, x0, lsl 2]
fadd z31.s, z31.s, z30.s
st1w z31.s, p7, [x10, x0, lsl 2]
add x8, x8, x23
add x0, x0, x12
whilelo p7.s, w0, w9
b.any .L4
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMWN2P4jgS_2vMS4kocQgfDzw00KxWmts9XY_mnp24APckds52Grr_-pPtBBI-Zvt0u6tBCGxX-VcfrqqUw4wRe4m4JNmKZJsRa-xB6WU1NkxYNd69jnLF35frksk9FExKZeHl2zO8YWGVFh8IX1--rcEkyYTQNeSNhV_W6ySNaBQ7_ojEGxI_hd_fayuUJGk7JdN4_PuOGQvjiuniQNIN09XbPKJjRujKvCGZxmfW8C3aOU057oRE-PL8G6Q0juM7BAo0m4Z1Y5kVBQhpQVpRoQGSbsDti2OSrvpa7krFLDCSrb48_0ayjbdrMCsGMz6Y4WXWwbaALSJ1pMvIgeePKEXxiML5XcrQEmctb6rqndC5V2Kg6Z-0EkTB-dNjuFX7_yXSxR0TQ_DNHTFQZquhUjulgdC5P_zSH7w78zBeuyAg9InQeQgMQreEzr1kuggCQZaErvx3ATfod6SIixDhZXg4N_sEzh281wveq8cTfngPzIeZCO57JdnG7_SLr2FR-EW6CmHX4zx7Fshsc1-1h4QuylhIFp8jPjV8RtC10yBEeohquoY46p_mQwkabaMl9JK04zjXhDB9QQRWGuV8FsrVm3G1yEtpo-VgbW3czJ3xdq94rkobKb0ndPtB6HanC_VLevrIBvVA6cgVQfj1X_3SFb5l-VYN2eYkfXikN_5MV1Br5KEaEZqdQaJa4wEZR-181ad07iE0E5K_MW0i8eb31wcBYjoBkq2G1EjiyV7jQJtYjju-Jl7Ee7aLSKY1exf8lMRe5B4tllihtLXVIGSuGumsyVY0m8LpMmhze9MKddxkEoeocDrH3WCg-N3FSattwvO9-6Oznn5BrVIx3lWMVlhfdR-QpdhLmNxAhbnNGfML83u2T_4M20MuPLZ98tglF4VT2tMv-YTtk4e2p_SHtnPu0Xdu4J_Z4anmxLbBc3UsaXrPdfO_I2w-57pJp5-xSuPFHsb5Pc_NH3su-5HnrtIw1GPOQTZHkOZ4X83kSsK0B4gnYQsleeR6MYcmiqoG_M8tUj_vH-dPknXguQaRXItwzCXLsewKhCcUJTLZ1LOoVKp27DdsoVgNJU3buauafqc3bna3nl96xMqxw5hj3uzHSpbvJN24zWNWFGgMGjKNoUJj2B7NbYkO0y9PT64wb12kAQPd-Kc9FAcsvruz9lX7547U_9mEn69Gtyb8G0EicrAKuIIYaiWkRQ2FqmqmhVHSRIMN_2DvkCOwvES3qUa9U7oCBhVWSr9f-ULsPDzyIcjaEYXcd5s41ig5ykKgGTC-6AJeCnxzY9dZzVadH3y7tZ64HiRdyyNJn0m67uVF1ntup8_nDUlMuz3NcFPbMTy_CPn9hzJ7EJ8W-wcyCZ3_KnlTuEsZGIu1k-1uTF0j3bpjI4xlskDfkf41T1l3qH9NIT67M-67Mg6e-WMXjm-Yz97zLvpnG7mhFMFR2ANIJceFks5pFozVgmNg_qosK2HThZ0vVpAMfN1Iw3Z4Dk3bxaqqUTN3UAaEBFf9flA0vx4QuNBYWNgxd08Hg1gZ5-QcwR6YBetZ2nPN0R4RpV_0h6p2cH1jYJJ7ejijlqF_zxAG3AOpM3x48Q_Yyh5Qw4FJf67-PYHb1mX15Z1C_t5OoGLmO3KvFaFbLzy6spyZtgOPvkzO1b_rsEueHN3_R5pExkfTLPT6vvE9-Vj6oItAa0722Gt3rxHiewiJf1SffLSXpgR6A-BbprP83sDhDTiNvaNsKyZ-KMbBu08wpv2l6RD6zNTDOSV0yHQ8iBJLBfUsiD96tuNiyJVHTL4DgHP38CRGfJnyRbpgI1wms3g6i2ezdDY6LGe7jM6mnGWLWcHZPC6yxSRNdzzJWFZM5sVILGlMJ3GSJklK51kWzXbJdM4KtpjvaBHHOZnEWDFRRr51UHo_EsY0uJzHSboY-cbD-HdYlEo8gicS6rw00ku3Z5w3e0MmcSmMNRcUK2zpX349PeniMJ2EgH759uxCeh3ed-V4CU0OORasMRgy6Jym51y6SgN3G_368m3tHBneUyxGjS6XVzdRYQ9NHhWqInTrr5Thb1xr9YqFJXTrLTKEbr3F_w0AAP__3Q9vOQ">