<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56591>56591</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            flag set from SVE svwhilelt intrinsic not reused in loop
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          yuyichao
      </td>
    </tr>
</table>

<pre>
    Ref gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106340

The LLVM sve autovectorizer does not seem to be emitting VLA style code so I can't directly compare manually written code to automatically vectorized code like in the gcc bug report. At least the code emitted for the manual version are similar.

I expect the following code to be compiled to efficient vla style assembly code
```c++
void set2(uint32_t *__restrict__ out, size_t m)
{
    auto svelen = svcntw();
    auto v = svdup_u32(1);
    if (m != 0) {
        auto pg = svwhilelt_b32(0ul, m);
        for (size_t i = 0; i < m; i += svelen, pg = svwhilelt_b32(i, m)) {
            svst1(pg, &out[i], v);
        }
    }
}
```

And the code LLVM emits is
```asm
        cbz     x1, .LBB1_3
        mov     x8, xzr
        cntw    x9
        whilelo p0.s, xzr, x1
        mov     z0.s, #1
.LBB1_2:
        st1w    { z0.s }, p0, [x0, x8, lsl #2]
        add     x8, x8, x9
        whilelo p0.s, x8, x1
        cmp     x8, x1
        b.lo    .LBB1_2
.LBB1_3:
        ret
```

which is almost identical to the gcc autovectorized code (since the c++ source is written this way...) except that the `cmp` and `b.lo` can simply be replaced with a conditional branch using the result of the `whilelo` instruction.

In the longer term, I would also expect source code like the following to work as well

```c++
void set3(uint32_t *__restrict__ out, size_t m)
{
    auto svelen = svcntw();
    auto v = svdup_u32(1);
    for (size_t i = 0; i < m; i += svelen) {
        auto pg = svwhilelt_b32(i, m);
        svst1(pg, &out[i], v);
    }
}
```

Currently, the loop transformation splits the cmp and the pairing whilelt making the flag from whilelt unusable.

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNVsuOozgU_RqysQbxKPJYZJF0TUkl1WxmRrWNDBjiKYORbUIlXz_nGlJ5dHXPdK8aIfDj3nOfHJPr8rj-U1SsLgqW9zXbO9fZIN0EyRNurIZ124fa1Jhh_ySV4hjavR52mIdFLYP0SZZB-hhH8_QhCqLHINqMz7_3gr28vP7B7EEw3jt9EIXTRp6EYaUWlrXaMStEw5xmuWCikc7JtmavLxtm3VEJVuhSMKvZMyt4GyQLx0ppgKKO2Go6bgRreNtzhYXBQF20ow4QyWLDnSz87oftchRQ8k0w2TIHJ8_RG9Fp40K2cUwJbp3f9NLeNahW2vjF0ShAjZW6ZeSHlY1U3ITXGXhm4r2DYa9TaaX0QPGdPcyFj0IqIGMqqkoWUrSOHRSfEsCtFU3uwy3FBD2PxrsIki3dfvWgZYlkuiRIlr1sXZrsHAuSzW5nhHVGFm63Y7p3QfIFrp4EdpsgWU2QiwmF4aK8UckUcom6Yli0bgAsiaf3godJpuy7XZ-S9fheTlZwZNngEZNshH12Y_EDrKsntGGPpCi3yz1i1Ctyu7kHposqApEpJMlGC-nWD0lnHCJNHpiiIqxvWJIfdj7zkS57sA4RLruaRINkTjnNtjLIHmnh8JmPweLxsvAxuQzOBb1unU1bXtrPf0XUg5ZJe6fEbXNrrchP_v0ek0Phy3Yb79JbkUYfRpElibyfzB0C6u23V7frY65Qpii0Z0V6xZ-jnyaxIEknidGZhAjmRgMpHcbcbL2WTxJVKfL62fbdD0Z3lVUEmVDGb1uoLK-jGp__FcLyswiKprtGutvNQwDgOkdzFVr6VWhGuO9UGe4Ue9SUcdVoEI4s8fkTYxEfnKnphjon-vIt3xZi7JGRCECUvcES4M5c6PY04ccwDKmjxXshOmIjPlISkUjT4ck42g1vCo2mYFsitA7EA5ICLypewPQg3Z5xeNCW0oH44GdueIsQekvERpigm145pquzhSnlBCtbUFFfkOotT448rHRb42xwwjSU92c26F6VyA1OgIlIpxAvFH7LrMjaoM0baJMNQqlrG9-nzfTXo82fYrYfZFb5TV79UZ77n7T2pTcGLa6OBDLWXHfMoYks4qXjGsep7RRRnW9tfIl8osKOS0NFnkLAIfx2brpK8ZpVRjcfm33bW54rMfXZrFyn5Spd8ZmTTom1V0DdR6W_Xn-_pAZdisK3Vhb-B8WI3qL18atArs56o9Z3f0n4KPo8xEmOiVKH8-u3zuh_0LOYSmt7Ab55yubZKp7t13EWRfG8XETzvKqieRSnZVVlWTlPHtJ0lS9miudC2TUyHiRJKwbmITBG_mdynURJEi3iZZwCKQ2rWBRRlj5kvORZkojgIRINlyokP-j3bWbW3iX85VhsKmmdvWziJ0PWrRDeHPDRMXtt1sf-CHbieuZtr73v_wIpF_79">