<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/71521>71521</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Missed if-conversion and vectorisation opportunity (tsvc, s124)
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            vectorization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          sjoerdmeijer
      </td>
    </tr>
</table>

<pre>
    We are generating a lot of code with Clang for a loop that contains an if-then statement resulting in predicated instructions, which don't seem to be necessary looking at GCC's codegen. For this kernel s124 in TSVC, we are about 60% behind.

Compile this input with `-O3 -mcpu=neoverse-v2 -ffast-math`:

```
__attribute__((aligned(64))) float a[32000],b[32000],c[32000],d[32000],e[32000],
 aa[256][256],bb[256][256],cc[256][256],tt[256][256];

int dummy(float[32000], float[32000], float[32000], float[32000], float[32000], float[256][256], float[256][256], float[256][256], float);

float s124(struct args_t * func_args)
{
    int j;
    for (int nl = 0; nl < 100000; nl++) {
        j = -1;
        for (int i = 0; i < 32000; i++) {
            if (b[i] > (float)0.) {
                j++;
 a[j] = b[i] + d[i] * e[i];
            } else {
 j++;
                a[j] = c[i] + d[i] * e[i];
            }
 }
        dummy(a, b, c, d, e, aa, bb, cc, 0.);
 }
}
```

Clang's codegen:

```
.LBB0_3: //   Parent Loop BB0_2 Depth=1
        ld1w    { z0.s }, p2/z, [x12, x9, lsl #2]
        ld1w    { z1.s }, p2/z, [x25, x9, lsl #2]
        ld1w    { z2.s }, p2/z, [x24, x9, lsl #2]
        asr x10, x8, #30
        add     x8, x8, x20
        add     x11, x23, x10
        ld1w    { z3.s }, p2/z, [x26, x9, lsl #2]
        ld1w    { z4.s }, p2/z, [x22, x9, lsl #2]
        ld1w    { z5.s }, p2/z, [x19, x9, lsl #2]
        fcmgt   p0.s, p2/z, z0.s, #0.0
        fcmgt   p1.s, p2/z, z1.s, #0.0
        sel     z0.s, p0, z0.s, z2.s
        sel z1.s, p1, z1.s, z3.s
        ld1w    { z2.s }, p2/z, [x21, x9, lsl #2]
 ld1w    { z3.s }, p2/z, [x27, x9, lsl #2]
        add     x9, x9, x15
        cmp     x28, x9
        fmla    z0.s, p2/m, z4.s, z2.s
 fmla    z1.s, p2/m, z5.s, z3.s
        st1b    { z0.b }, p3, [x23, x10]
        st1w    { z1.s }, p2, [x11, x14, lsl #2]
        b.ne .LBB0_3
```

vs. GCC's codegen:

```
.L2:
        ldr q30, [x23, x0]
        ldr     q27, [x22, x0]
        ldr     q28, [x21, x0]
        ldr     q31, [x20, x0]
        fcmgt   v29.4s, v30.4s, 0
        fmla    v30.4s, v27.4s, v28.4s
        fmla v31.4s, v27.4s, v28.4s
        bit     v31.16b, v30.16b, v29.16b
 str     q31, [x19, x0]
        add     x0, x0, 16
        cmp     x0, x24
        bne     .L2
```

See also:
https://godbolt.org/z/nb6xYxxKo
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0V1uPo7gS_jXOSynILi6BhzzkcnIezhztSrPa1T61DJjEPWAz2Ml0969f2ZALydDbM6tFkSmXy19dXcHcGLlXQixJvCbxdsaP9qC7pXnWoisbIZ9FN8t1-br8QwDvBOyFEh23Uu2BQ60t6AoKXQr4Ju0BNjVXe6h05xd1C_bALRRaWS6VAa5AVnN7EAqM5VY0QlnohDnWHlAqaDtRyoJbUYJUxnbHwkqtDMENfDvI4gClVgQXFowQDVgNuQAlCmEM716dyi_eMgv_3WwILoy3bS9UADvdgT1IA19Ep0QNhmHkNP72-feNh-_947k-WkgowRhycZCqDAjdErrqx41uWlmLHkmq9mh7x0lC57-EMG-K9kjCrRL6JDoj5ieEeVVxY-cNtweSUBKubvEcp__56dMTt7aT-dGKpyeCKcGU1y4_JcE0iQhm_Q-qWnMLnMTrECmlJN4S3OTjaTGeluOpGE97_cAdJMaJY14I3OT5d9lF8V22td9hh-tbv6WyUB6b5pVg6n0ZWwP_Fu_R2H-0hNmdX31eXG0RTPv6Bd7tzZMFgiuojqp4cnO3s9-2GPYDALioPF8QHccdJYKpW1A1kHALlITrnt4Ao-7pGQTX_pfBCNI9z37jnI2Q79DlFVx67D56bvoOsje6chiuQCSJt0DC_8AlqZjRYHKjN20Av1jm6u-5x9nCFRTXUF4nKxDD5MEl95DFFkRtxI3aBz13z0ht8dNqB-aVGp5zqXNXOrkbCjeUbhBu4P1Kv-TXfOCuei6QV2LcOob-5BrwbeN7v98En9Zr-hSScAUEdwR3APAr71xb_uTat1tF2IrWHki4ZWOn6pJ96_1ewxsNjLcRN9Aiwd2bo0i8fmHoqJfMjbWpgWCILoLTUGwKCuMfhcJJqOgDUNx08MKol0z9TgxDeidTlv7dCwwjTgkx1q-H_sXoO6aHk6YnPxqFaBLqh3MTT6Y5-wBUVTR7CwAtDcwY4m3gEAxpQCd2sYddbHqXEbV_n5FbeqvHVcaj_BmvZbfoLhU_VWPsvZB8MNmLj9TpubxucvDC4rFQ0bS9EKaD1DjITc1H4XJmNN796CFkF2H2IBxPxcxYlt90i_zib3jx9XIq7h00dro9DOXXx5pF7wYqD5SAc8-bbqInE9x_P_5dG8WLwLVGOvga0jvnHn1zcu752qf65li-J5veFdk7siG7yNIJ2fMJO2EWRD5_p5AO1P1hHFJ_FTjh4kKljnqUP4XsQ8K5tNCDs4Al-dmQM4mZJ_stxj74N_SgR_8uB-QcANwASybORy-D0Z1pSvi3S_V05XwWAnht9KUaDta2xs38v-tel7mubaC7fX_IdypPXv58efmfnpXLsMzCjM_EkiVZFsYJS9PZYbmokkWSRIyXFRaZiLGq0ihLsiKucqRROpNLpBgyRheMYUjTIMp4lC1i5qRyThckoqLhsg7q-tQ43TNpzFEsFyxGNqt5Lmrjr3-IOS--CFWScLVadcXB3ThcLRLEkyis7uQbd_cxx423s27pEOf5cW9IRGtprLnqsNLW_lZ5Roq38H9pjLvZVfNCK3dBkloBVyUM6Majg25b3dmjkvbVfUpac_JfRP1XdTY7dvXyLq7SHo55UOiG4M5ZMLzmbaefRWEJ7rzLhuDOe_1XAAAA__9Bw8H-">