<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/71525>71525</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] VLA slower than VLS (tsvc, s173)
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            vectorization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          sjoerdmeijer
      </td>
    </tr>
</table>

<pre>
    We are behind a lot compared to GCC. Compile this input with `-O3 -mcpu=neoverse-v2 -ffast-math`:

```
__attribute__((aligned(64))) float a[32000],b[32000],c[32000],d[32000],e[32000],
 aa[256][256],bb[256][256],cc[256][256],tt[256][256];

int dummy(float[32000], float[32000], float[32000], float[32000], float[32000], float[256][256], float[256][256], float[256][256], float);

float s173()
{
    int k = 32000/2;
    for (int nl = 0; nl < 10*100000; nl++) {
        for (int i = 0; i < 32000/2; i++) {
            a[i+k] = a[i] + b[i];
        }
        dummy(a, b, c, d, e, aa, bb, cc, 0.);
    }
}
```

Clang's codegen:

```
.LBB0_3: //   Parent Loop BB0_2 Depth=1
        add x9, x19, x8, lsl #2
        add     x10, x20, x8, lsl #2
        ld1w { z0.s }, p0/z, [x19, x8, lsl #2]
        ld1w    { z2.s }, p0/z, [x20, x8, lsl #2]
        add     x8, x8, x21
        ld1w    { z1.s }, p0/z, [x9, x28, lsl #2]
        ld1w    { z3.s }, p0/z, [x10, x28, lsl #2]
 add     x10, x9, x26
        cmp     x8, x22
        fadd    z0.s, z2.s, z0.s
        fadd    z1.s, z3.s, z1.s
        st1w    { z0.s }, p0, [x9, x23, lsl #2]
        st1w    { z1.s }, p0, [x10, x28, lsl #2]
 b.ne    .LBB0_3
```

vs. GCC's codegen:

```
.L3:
        ldr     q31, [x20, x0]
        ldr q30, [x19, x0]
        fadd    v31.4s, v31.4s, v30.4s
        str q31, [x21, x0]
        add     x0, x0, 16
        cmp     x0, x28
 bne     .L3
```

See also:
https://godbolt.org/z/9zs65h3aq

Might be caused by the same underlying issue as:
https://github.com/llvm/llvm-project/issues/71524

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0Vt-P6qoT_2vwZWIDg6h98EFt_L7sN_cmJzn3cUMLWnZpcQt1f_z1N7TqWlf35OTkkgaYYfzMzGdAkN6bXa31gogVEdlItqF0zcI_Od2oSpsn3Yxyp94X_2iQjYZcl6ZWIMG6AIWr9rLRCoKD_63XCaxdtTdWQyiNB1Pv2wCvJpRApnT8F4dxVexbwrNau4NuvB4fEMbbrfRhXMlQkiklfEloRuipn9Lj14mPjzKExuRt0I-PBOcE59LG8BXB-XRCMO0_2FonA0giVhwppURkBNf5UCyGohqKeij2_kFGSBTTqDxPcJ3nN9VFcVMdwg01X13mbeoAqq2qd4LzLpdhNPBf6b4G-0dLmF7l1dfFsxnvqpce12ZHIwCAmPozEJ5BHyBu8AwS17euAYLzaFbbzo4Svurna2CU4JLR2HotwVX3pTDwcoVkPoFMh3PpG8w3ILHFTRFtnonIOqBeEQVcQX4UBlnERmbZUHGquIwM5rErYqdip2Mn-5V-qVujySXJA9TPyfAQ9f3aynpHcOahcErvdP39yUseViv6yAlfAsENwQ0A_C0bXQd4cG4PcRUh0_tQEp6xYV5SKXhLY7hvrB_msbfeAkGOX41je4ulXMMb0l_9wir2GgsDHzTxXf64hn2s3kecEbG67VZkN3A6BlfwgfegbsZzDXVOYv5p_IbsG3_snr8-cvyN0PldFui3WNfMHz1Ph66Kan-ZGl4VY3tEibWIBpHIbozybUt2tODHkV1b-nCR3VWNByTxb0kawLBbML8gKE9qHQFOh-H-6Tr4JF6Iv3O--Nngs6hNN75wdrX56I0N0MALp1f7_avdifQDZ8mko_tyRuPsirNm4J_dwT3vnVOAuAZ2b-ecWT7S2rMKkYP7lP7QGqT17kxTGcLeR6n7P9o5lTsbEtfs-i2_ST_8VJRcvlyi_N_sygC5hkK2XivI3yGUGrysNLS10o19N_UOjPetBunvOTOhbPOkcBXBjbWH0zDeN-5JF4HgpkPwBDczJnDSg4zUgquUp3KkF2yaplzMBE5G5UJNmKYTnW5zWaQM2UzM9XSKgs1ynvNUjMwCKXLG6IyxicBJUggxn7FcK76dS6kKMqG6ksYmMYxIwqgLYBG9i5GVuba-e-Ih5rJ41rUifLlcNkUZn00YXziIB10E15gPGYyro1Zko2bRJZa3O08m1Bof_KePYILtXo4nJJHBz4cleOtedQOhlDX8fPgRb9ngD92F1d_96aht7OLPSBX_BgAA___emLN1">