<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/71521>71521</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] Missed if-conversion and vectorisation opportunity (tsvc, s124)
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AArch64,
vectorization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
sjoerdmeijer
</td>
</tr>
</table>
<pre>
We are generating a lot of code with Clang for a loop that contains an if-then statement resulting in predicated instructions, which don't seem to be necessary looking at GCC's codegen. For this kernel s124 in TSVC, we are about 60% behind.
Compile this input with `-O3 -mcpu=neoverse-v2 -ffast-math`:
```
__attribute__((aligned(64))) float a[32000],b[32000],c[32000],d[32000],e[32000],
aa[256][256],bb[256][256],cc[256][256],tt[256][256];
int dummy(float[32000], float[32000], float[32000], float[32000], float[32000], float[256][256], float[256][256], float[256][256], float);
float s124(struct args_t * func_args)
{
int j;
for (int nl = 0; nl < 100000; nl++) {
j = -1;
for (int i = 0; i < 32000; i++) {
if (b[i] > (float)0.) {
j++;
a[j] = b[i] + d[i] * e[i];
} else {
j++;
a[j] = c[i] + d[i] * e[i];
}
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}
}
```
Clang's codegen:
```
.LBB0_3: // Parent Loop BB0_2 Depth=1
ld1w { z0.s }, p2/z, [x12, x9, lsl #2]
ld1w { z1.s }, p2/z, [x25, x9, lsl #2]
ld1w { z2.s }, p2/z, [x24, x9, lsl #2]
asr x10, x8, #30
add x8, x8, x20
add x11, x23, x10
ld1w { z3.s }, p2/z, [x26, x9, lsl #2]
ld1w { z4.s }, p2/z, [x22, x9, lsl #2]
ld1w { z5.s }, p2/z, [x19, x9, lsl #2]
fcmgt p0.s, p2/z, z0.s, #0.0
fcmgt p1.s, p2/z, z1.s, #0.0
sel z0.s, p0, z0.s, z2.s
sel z1.s, p1, z1.s, z3.s
ld1w { z2.s }, p2/z, [x21, x9, lsl #2]
ld1w { z3.s }, p2/z, [x27, x9, lsl #2]
add x9, x9, x15
cmp x28, x9
fmla z0.s, p2/m, z4.s, z2.s
fmla z1.s, p2/m, z5.s, z3.s
st1b { z0.b }, p3, [x23, x10]
st1w { z1.s }, p2, [x11, x14, lsl #2]
b.ne .LBB0_3
```
vs. GCC's codegen:
```
.L2:
ldr q30, [x23, x0]
ldr q27, [x22, x0]
ldr q28, [x21, x0]
ldr q31, [x20, x0]
fcmgt v29.4s, v30.4s, 0
fmla v30.4s, v27.4s, v28.4s
fmla v31.4s, v27.4s, v28.4s
bit v31.16b, v30.16b, v29.16b
str q31, [x19, x0]
add x0, x0, 16
cmp x0, x24
bne .L2
```
See also:
https://godbolt.org/z/nb6xYxxKo
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0V1uPo7gS_jXOSynILi6BhzzkcnIezhztSrPa1T61DJjEPWAz2Ml0969f2ZALydDbM6tFkSmXy19dXcHcGLlXQixJvCbxdsaP9qC7pXnWoisbIZ9FN8t1-br8QwDvBOyFEh23Uu2BQ60t6AoKXQr4Ju0BNjVXe6h05xd1C_bALRRaWS6VAa5AVnN7EAqM5VY0QlnohDnWHlAqaDtRyoJbUYJUxnbHwkqtDMENfDvI4gClVgQXFowQDVgNuQAlCmEM716dyi_eMgv_3WwILoy3bS9UADvdgT1IA19Ep0QNhmHkNP72-feNh-_947k-WkgowRhycZCqDAjdErrqx41uWlmLHkmq9mh7x0lC57-EMG-K9kjCrRL6JDoj5ieEeVVxY-cNtweSUBKubvEcp__56dMTt7aT-dGKpyeCKcGU1y4_JcE0iQhm_Q-qWnMLnMTrECmlJN4S3OTjaTGeluOpGE97_cAdJMaJY14I3OT5d9lF8V22td9hh-tbv6WyUB6b5pVg6n0ZWwP_Fu_R2H-0hNmdX31eXG0RTPv6Bd7tzZMFgiuojqp4cnO3s9-2GPYDALioPF8QHccdJYKpW1A1kHALlITrnt4Ao-7pGQTX_pfBCNI9z37jnI2Q79DlFVx67D56bvoOsje6chiuQCSJt0DC_8AlqZjRYHKjN20Av1jm6u-5x9nCFRTXUF4nKxDD5MEl95DFFkRtxI3aBz13z0ht8dNqB-aVGp5zqXNXOrkbCjeUbhBu4P1Kv-TXfOCuei6QV2LcOob-5BrwbeN7v98En9Zr-hSScAUEdwR3APAr71xb_uTat1tF2IrWHki4ZWOn6pJ96_1ewxsNjLcRN9Aiwd2bo0i8fmHoqJfMjbWpgWCILoLTUGwKCuMfhcJJqOgDUNx08MKol0z9TgxDeidTlv7dCwwjTgkx1q-H_sXoO6aHk6YnPxqFaBLqh3MTT6Y5-wBUVTR7CwAtDcwY4m3gEAxpQCd2sYddbHqXEbV_n5FbeqvHVcaj_BmvZbfoLhU_VWPsvZB8MNmLj9TpubxucvDC4rFQ0bS9EKaD1DjITc1H4XJmNN796CFkF2H2IBxPxcxYlt90i_zib3jx9XIq7h00dro9DOXXx5pF7wYqD5SAc8-bbqInE9x_P_5dG8WLwLVGOvga0jvnHn1zcu752qf65li-J5veFdk7siG7yNIJ2fMJO2EWRD5_p5AO1P1hHFJ_FTjh4kKljnqUP4XsQ8K5tNCDs4Al-dmQM4mZJ_stxj74N_SgR_8uB-QcANwASybORy-D0Z1pSvi3S_V05XwWAnht9KUaDta2xs38v-tel7mubaC7fX_IdypPXv58efmfnpXLsMzCjM_EkiVZFsYJS9PZYbmokkWSRIyXFRaZiLGq0ihLsiKucqRROpNLpBgyRheMYUjTIMp4lC1i5qRyThckoqLhsg7q-tQ43TNpzFEsFyxGNqt5Lmrjr3-IOS--CFWScLVadcXB3ThcLRLEkyis7uQbd_cxx423s27pEOf5cW9IRGtprLnqsNLW_lZ5Roq38H9pjLvZVfNCK3dBkloBVyUM6Majg25b3dmjkvbVfUpac_JfRP1XdTY7dvXyLq7SHo55UOiG4M5ZMLzmbaefRWEJ7rzLhuDOe_1XAAAA__9Bw8H-">