<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/56591>56591</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
flag set from SVE svwhilelt intrinsic not reused in loop
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
yuyichao
</td>
</tr>
</table>
<pre>
Ref gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106340
The LLVM sve autovectorizer does not seem to be emitting VLA style code so I can't directly compare manually written code to automatically vectorized code like in the gcc bug report. At least the code emitted for the manual version are similar.
I expect the following code to be compiled to efficient vla style assembly code
```c++
void set2(uint32_t *__restrict__ out, size_t m)
{
auto svelen = svcntw();
auto v = svdup_u32(1);
if (m != 0) {
auto pg = svwhilelt_b32(0ul, m);
for (size_t i = 0; i < m; i += svelen, pg = svwhilelt_b32(i, m)) {
svst1(pg, &out[i], v);
}
}
}
```
And the code LLVM emits is
```asm
cbz x1, .LBB1_3
mov x8, xzr
cntw x9
whilelo p0.s, xzr, x1
mov z0.s, #1
.LBB1_2:
st1w { z0.s }, p0, [x0, x8, lsl #2]
add x8, x8, x9
whilelo p0.s, x8, x1
cmp x8, x1
b.lo .LBB1_2
.LBB1_3:
ret
```
which is almost identical to the gcc autovectorized code (since the c++ source is written this way...) except that the `cmp` and `b.lo` can simply be replaced with a conditional branch using the result of the `whilelo` instruction.
In the longer term, I would also expect source code like the following to work as well
```c++
void set3(uint32_t *__restrict__ out, size_t m)
{
auto svelen = svcntw();
auto v = svdup_u32(1);
for (size_t i = 0; i < m; i += svelen) {
auto pg = svwhilelt_b32(i, m);
svst1(pg, &out[i], v);
}
}
```
Currently, the loop transformation splits the cmp and the pairing whilelt making the flag from whilelt unusable.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNVsuOozgU_RqysQbxKPJYZJF0TUkl1WxmRrWNDBjiKYORbUIlXz_nGlJ5dHXPdK8aIfDj3nOfHJPr8rj-U1SsLgqW9zXbO9fZIN0EyRNurIZ124fa1Jhh_ySV4hjavR52mIdFLYP0SZZB-hhH8_QhCqLHINqMz7_3gr28vP7B7EEw3jt9EIXTRp6EYaUWlrXaMStEw5xmuWCikc7JtmavLxtm3VEJVuhSMKvZMyt4GyQLx0ppgKKO2Go6bgRreNtzhYXBQF20ow4QyWLDnSz87oftchRQ8k0w2TIHJ8_RG9Fp40K2cUwJbp3f9NLeNahW2vjF0ShAjZW6ZeSHlY1U3ITXGXhm4r2DYa9TaaX0QPGdPcyFj0IqIGMqqkoWUrSOHRSfEsCtFU3uwy3FBD2PxrsIki3dfvWgZYlkuiRIlr1sXZrsHAuSzW5nhHVGFm63Y7p3QfIFrp4EdpsgWU2QiwmF4aK8UckUcom6Yli0bgAsiaf3godJpuy7XZ-S9fheTlZwZNngEZNshH12Y_EDrKsntGGPpCi3yz1i1Ctyu7kHposqApEpJMlGC-nWD0lnHCJNHpiiIqxvWJIfdj7zkS57sA4RLruaRINkTjnNtjLIHmnh8JmPweLxsvAxuQzOBb1unU1bXtrPf0XUg5ZJe6fEbXNrrchP_v0ek0Phy3Yb79JbkUYfRpElibyfzB0C6u23V7frY65Qpii0Z0V6xZ-jnyaxIEknidGZhAjmRgMpHcbcbL2WTxJVKfL62fbdD0Z3lVUEmVDGb1uoLK-jGp__FcLyswiKprtGutvNQwDgOkdzFVr6VWhGuO9UGe4Ue9SUcdVoEI4s8fkTYxEfnKnphjon-vIt3xZi7JGRCECUvcES4M5c6PY04ccwDKmjxXshOmIjPlISkUjT4ck42g1vCo2mYFsitA7EA5ICLypewPQg3Z5xeNCW0oH44GdueIsQekvERpigm145pquzhSnlBCtbUFFfkOotT448rHRb42xwwjSU92c26F6VyA1OgIlIpxAvFH7LrMjaoM0baJMNQqlrG9-nzfTXo82fYrYfZFb5TV79UZ77n7T2pTcGLa6OBDLWXHfMoYks4qXjGsep7RRRnW9tfIl8osKOS0NFnkLAIfx2brpK8ZpVRjcfm33bW54rMfXZrFyn5Spd8ZmTTom1V0DdR6W_Xn-_pAZdisK3Vhb-B8WI3qL18atArs56o9Z3f0n4KPo8xEmOiVKH8-u3zuh_0LOYSmt7Ab55yubZKp7t13EWRfG8XETzvKqieRSnZVVlWTlPHtJ0lS9miudC2TUyHiRJKwbmITBG_mdynURJEi3iZZwCKQ2rWBRRlj5kvORZkojgIRINlyokP-j3bWbW3iX85VhsKmmdvWziJ0PWrRDeHPDRMXtt1sf-CHbieuZtr73v_wIpF_79">