<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/56591>56591</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            flag set from SVE svwhilelt intrinsic not reused in loop

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            new issue

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          yuyichao

      </td>

    </tr>

</table>

<pre>

    Ref gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106340

The LLVM sve autovectorizer does not seem to be emitting VLA style code so I can't directly compare manually written code to automatically vectorized code like in the gcc bug report. At least the code emitted for the manual version are similar.

I expect the following code to be compiled to efficient vla style assembly code

```c++

void set2(uint32_t *__restrict__ out, size_t m)

{

    auto svelen = svcntw();

    auto v = svdup_u32(1);

    if (m != 0) {

        auto pg = svwhilelt_b32(0ul, m);

        for (size_t i = 0; i < m; i += svelen, pg = svwhilelt_b32(i, m)) {

            svst1(pg, &out[i], v);

        }

    }

}

```

And the code LLVM emits is

```asm

        cbz     x1, .LBB1_3

        mov     x8, xzr

        cntw    x9

        whilelo p0.s, xzr, x1

        mov     z0.s, #1

.LBB1_2:

        st1w    { z0.s }, p0, [x0, x8, lsl #2]

        add     x8, x8, x9

        whilelo p0.s, x8, x1

        cmp     x8, x1

        b.lo    .LBB1_2

.LBB1_3:

        ret

```

which is almost identical to the gcc autovectorized code (since the c++ source is written this way...) except that the `cmp` and `b.lo` can simply be replaced with a conditional branch using the result of the `whilelo` instruction.

In the longer term, I would also expect source code like the following to work as well

```c++

void set3(uint32_t *__restrict__ out, size_t m)

{

    auto svelen = svcntw();

    auto v = svdup_u32(1);

    for (size_t i = 0; i < m; i += svelen) {

        auto pg = svwhilelt_b32(i, m);

        svst1(pg, &out[i], v);

    }

}

```

Currently, the loop transformation splits the cmp and the pairing whilelt making the flag from whilelt unusable.

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNVsuOozgU_RqysQbxKPJYZJF0TUkl1WxmRrWNDBjiKYORbUIlXz_nGlJ5dHXPdK8aIfDj3nOfHJPr8rj-U1SsLgqW9zXbO9fZIN0EyRNurIZ124fa1Jhh_ySV4hjavR52mIdFLYP0SZZB-hhH8_QhCqLHINqMz7_3gr28vP7B7EEw3jt9EIXTRp6EYaUWlrXaMStEw5xmuWCikc7JtmavLxtm3VEJVuhSMKvZMyt4GyQLx0ppgKKO2Go6bgRreNtzhYXBQF20ow4QyWLDnSz87oftchRQ8k0w2TIHJ8_RG9Fp40K2cUwJbp3f9NLeNahW2vjF0ShAjZW6ZeSHlY1U3ITXGXhm4r2DYa9TaaX0QPGdPcyFj0IqIGMqqkoWUrSOHRSfEsCtFU3uwy3FBD2PxrsIki3dfvWgZYlkuiRIlr1sXZrsHAuSzW5nhHVGFm63Y7p3QfIFrp4EdpsgWU2QiwmF4aK8UckUcom6Yli0bgAsiaf3godJpuy7XZ-S9fheTlZwZNngEZNshH12Y_EDrKsntGGPpCi3yz1i1Ctyu7kHposqApEpJMlGC-nWD0lnHCJNHpiiIqxvWJIfdj7zkS57sA4RLruaRINkTjnNtjLIHmnh8JmPweLxsvAxuQzOBb1unU1bXtrPf0XUg5ZJe6fEbXNrrchP_v0ek0Phy3Yb79JbkUYfRpElibyfzB0C6u23V7frY65Qpii0Z0V6xZ-jnyaxIEknidGZhAjmRgMpHcbcbL2WTxJVKfL62fbdD0Z3lVUEmVDGb1uoLK-jGp__FcLyswiKprtGutvNQwDgOkdzFVr6VWhGuO9UGe4Ue9SUcdVoEI4s8fkTYxEfnKnphjon-vIt3xZi7JGRCECUvcES4M5c6PY04ccwDKmjxXshOmIjPlISkUjT4ck42g1vCo2mYFsitA7EA5ICLypewPQg3Z5xeNCW0oH44GdueIsQekvERpigm145pquzhSnlBCtbUFFfkOotT448rHRb42xwwjSU92c26F6VyA1OgIlIpxAvFH7LrMjaoM0baJMNQqlrG9-nzfTXo82fYrYfZFb5TV79UZ77n7T2pTcGLa6OBDLWXHfMoYks4qXjGsep7RRRnW9tfIl8osKOS0NFnkLAIfx2brpK8ZpVRjcfm33bW54rMfXZrFyn5Spd8ZmTTom1V0DdR6W_Xn-_pAZdisK3Vhb-B8WI3qL18atArs56o9Z3f0n4KPo8xEmOiVKH8-u3zuh_0LOYSmt7Ab55yubZKp7t13EWRfG8XETzvKqieRSnZVVlWTlPHtJ0lS9miudC2TUyHiRJKwbmITBG_mdynURJEi3iZZwCKQ2rWBRRlj5kvORZkojgIRINlyokP-j3bWbW3iX85VhsKmmdvWziJ0PWrRDeHPDRMXtt1sf-CHbieuZtr73v_wIpF_79">