<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/81112>81112</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64][SVE] Cannot be vectorized, but GCC can vectorize.(TSVC s235)
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          m-saito-fj
      </td>
    </tr>
</table>

<pre>
    Clang cannot SVE vectorize TSVC s235, but GCC13.2.0 can.

Option:
`-Ofast -march=armv8.2-a+sve`

```c
#define LEN 32000
#define LEN2 256
static int ntimes = 200000;

float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2], dd[LEN2][LEN2];

int dummy(float[LEN], float[LEN], float[LEN], float[LEN], float[LEN],
          float[LEN2][LEN2], float[LEN2][LEN2], float[LEN2][LEN2], float);

int s235()
{
        for (int nl = 0; nl < 200*(ntimes/LEN2); nl++) {
                for (int i = 0; i < LEN2; i++) {
                        a[i] += b[i] * c[i];
 for (int j = 1; j < LEN2; j++) {
 aa[j][i] = aa[j-1][i] + bb[j][i] * a[i];
                        }
 }
                dummy(a, b, c, d, e, aa, bb, cc, 0.);
 }
        return 0;
}
```

See also (Clang vs GCC):
https://godbolt.org/z/KeeK58oz7

GCC result:
```asm
.L4:
        add     x8, x9, 1024
        mov x0, 0
        lsl     x11, x12, 2
        add     x13, x20, x11
 add     x15, x1, x11
        ld1w    z1.s, p0/z, [x13]
        ld1w z0.s, p0/z, [x15]
        add     x11, x2, x11
        ld1w    z2.s, p0/z, [x11]
        fmad    z2.s, p1/m, z0.s, z1.s
        st1w z2.s, p0, [x13]
.L3:
        ld1w    z1.s, p0/z, [x9, x0, lsl 2]
 ld1w    z0.s, p0/z, [x10, x0, lsl 2]
        fmad    z0.s, p1/m, z2.s, z1.s
        st1w    z0.s, p0, [x8, x0, lsl 2]
        add     x0, x0, 256
        cmp     x0, x19
        bne     .L3
        add     x12, x12, x16
        add     x9, x9, x21
        add     x10, x10, x21
 whilelo p0.s, w12, w14
        b.any   .L4
```
Regarding this result, it appears to me that it is vectorized for i in the outer loop.

`-mllvm -debug-only=loop-vectorize` messages:
```
LV: Checking a loop in 's235' from s235.c:19:4
LV: Loop hints: force=? width=vscale x 0 interleave=0
LV: Found a loop: for.body13
LV: Not vectorizing: Found an unidentified PHI   %3 = phi float [ %.pre, %for.body4 ], [ %add25, %for.body13 ], !dbg !30
LV: Interleaving disabled by the pass manager
LV: Can't vectorize the instructions or CFG
```
LLVM does not appear to be able to account for vectorization of loops of the form s235.c. 
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysV1uP4joS_jXmpUTkOAmEBx4gNLOj6Z1Zba_63YkdcG8So9ihL7_-qOwACZc5RzonQvhSl6_KVbbL3Bi1a6RckmRNks2Ed3av22U9NVxZPS3fJrkWn8us4s0OCt402sLL6xMcZWF1q74k_O_lNQPDooSwDPLOwrcsC6OABRT5A0I3hK78_6-DVbohUT8kMzr9VXJjYVrzttiTaMPb-pgGbMoJW5ujJDM6lMeh-xX9mEVClqqR8Pz0EyJGKb1DYMCSmZ83lltVgGosNFbV0gCJNoBylJJoPcQqK80tcJKsn59-kmTjvBuNitFIjEbyMjqp7RX2GhmSLj1Unj-iFMUjihB3KWNP0FvR1fUnYakzYmTpPzTjoeD8DRhuzf67RLa446JPwRSJnjJfj40qdQuEpS74lQs8xtz3M0wCwlaEpT4xCNs6WAcFTUXY2v0WcKP3jn51Ua-cdqcLR39Bz-nDTFEk2QAKRBuXff145ZJPDWM9hH9z8CECvo3g3-7Bu5R880vt9UebfnIaDqfZ2ifpiJetzoZebHnwkfmmZ7j0rr5TonK_39w2c7vLbSqWoWF-s_iNwTKgwTAhblW30nZtA4MNfuI4nyfDXHqREnhlNC6mP_aOBs80B9Lz7K09GByxLWHbnRa5rmyg2x1h2y_Ctj-k_JGk-ms-VPwty6CVpqvs8AT0P25qPxM8x2fqORGEcO1Hiu5-LPA_pCwec9X6CB_ULciYUJnKi4ehkw8ZNuwBRhg5JkY9b3hKkjM98YQx-QQlwndsv8LAIMOB-gXJgCRrVJ1s7gh80bvcyQ33xQgPz35vBLurNrxRW9ZcjARCwrY19k6WOX9GMsai4QOEGxeD5-gmkr9fHxdYH0KMGbsYepa7v1L0oeC1h_TGQ_ZbD68wPV76Z3DnOA0MO9_BJ6aiPgyZwsWYnDfStbiMD5KADdL5I5w9YFtcNs0HCx_p6o2gY7b3vapkpeHQr8G7B3sPr_ZeHvDm0xkb3z1Z_it3vBWq2YHdK3M6BVgGygI_HCRvDVgNtQS75xZnlbmUWMKd7QpUA3YvQXdWtlBpfQiuiqNpXVXHGqZC5t1uqpvqk0QbZJyedZEZhVoaw3fS3B5Dfvj8SqIVZHtZ_B9N5g4L0Qmb-1t2DmWra3flBgWJViEejfFQ-hlF9qqxiIL2F5JEGxJt4V0Ji6Xe0RS8kvABFOsx2VaSH5FnZMRWd43oLegVBViQhtGQ66e259VSzW4g2EDXKCEbq0olBfznX9_xGmJJ5C65w175igLTGqeDQ-suGcKSE1IMfenRs3AhWHLFE0ZnJhaKfIdNNPLj-8lDXFChDM8rKSD_dAE9cGOg5g3fyXYUAd4QNreDWhu5VWNs2xVYSRvQLWTbb_ej-Pz6bxBaGsCa3WcZJlkuAdGxy4tCd4116XUC4agYdOmW3GAHQUvdnsIdwEQsI7GIFnwil-GczqM0XszpZL9MWJqURVrGSVpE8YzNi0JwxpMyDedChOlELRllMWU0pYtoFtIgKWZzFsl4VpaLSM5iElNZc1UFmMh4oU6UMZ1cpmEYsknFc1kZ91RhrJHv4IiE4Qk0aZcoM827nSExrZSx5qLFKlu5N85q1Rb7WewrmJfXJ6xhMv-syeVgyw0eM_iMuVACwtLBk2cx6dpqeVUPKLvv8qDQNWFbNKFvpodWv8nCErZ1hmOR6Rz7IwAA__-MG4fQ">