<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/132180>132180</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISCV][EVL] Improve AnyOf reduction codegen
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:RISC-V,
vectorizers
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
lukel97
</td>
</tr>
</table>
<pre>
On RISC-V, an AnyOf reduction is vectorized quite poorly with EVL tail folding:
```c
int f(int *x, int y, int n) {
int z = 0;
for (int i = 0; i < n; i++)
if (x[i] == y)
z = 1;
return z;
}
```
```asm
.LBB0_2: # %vector.body
# =>This Inner Loop Header: Depth=1
sub t0, a2, a3
sh2add a6, a3, a0
vsetvli t1, t0, e8, mf2, ta, ma
vsetvli a4, zero, e64, m4, ta, ma
vmv.v.x v16, t1
vmsleu.vv v9, v16, v12
vsetvli zero, t0, e32, m2, ta, ma
vle32.v v10, (a6)
sub a5, a5, a7
vsetvli a4, zero, e64, m4, ta, ma
vmsltu.vx v16, v12, t1
vmand.mm v9, v8, v9
vsetvli zero, zero, e32, m2, ta, ma
vmseq.vx v17, v10, a1
vmor.mm v8, v8, v17
vmand.mm v8, v8, v16
vmor.mm v8, v8, v9
add a3, a3, t1
bnez a5, .LBB0_2
# %bb.3: # %middle.block
vcpop.m a0, v8
snez a0, a0
ret
```
Compare this to the non-EVL tail folded version:
```asm
.LBB0_5: # %vector.body
# =>This Inner Loop Header: Depth=1
vl2re32.v v10, (a3)
add a3, a3, a4
vsetvli zero, zero, e32, m2, ta, ma
vmseq.vx v9, v10, a1
vmor.mm v8, v8, v9
bne a3, a5, .LBB0_5
# %bb.6: # %middle.block
vcpop.m a3, v8
# ...
```
The issue is due to the i1 vp.merge used having poor lowering, in turn due to the lack of tail preserving mask instructions in the RISC-V ISA.
A better lowering is likely to widen this to an i8 vector:
```asm
vsetvli a5, zero, e8, m1, ta, ma
vmv.v.i v9, 0
loop:
vsetvli a5, a7, e32, m1, ta, ma
vle32.v v8, (a0)
add a0, a0, a5
vmseq.vx v0, v8, zero
vsetvli zero, zero, e8, mf4, ta, ma
vmerge.vim v10, v9, 1, v0
vor.vv v11, v11, v10
sub a7, a7, a5
bnez a7, loop
exit:
vmsne.vi v10, v11, 0
vcpop.m a1, v10
```
Performing this as an in-loop reduction was also discussed in #131830, but it's likely not as profitable as the widened out-of-loop reduction.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy8Vl1v4ygX_jXkBtXCEMfxRS6SZqq3UqV5tTPq7Qqb44QtBg9g9-PXr8B24jTtzEi72ijCGHx4zsdzzoE7Jw8aYIOyHcr2C975o7Eb1T2BKvJFacTr5qvGf9x_u715RPQWc423-vVrjS2IrvLSaCwd7qHyxso3EPhHJz3g1hirXvGz9Ef85fEBey4Vro0SUh8Q2yIS_ysy_CtEtlJ7XCO6Dk9Ety8BLMxfp4lGtMAo3yGyxXHhDSO2xwSxYak2Fo_y8rQTp7dYxymiu_gv4vcYyzoIvKBsJ1G2DzJB7PX8AR4x0gnDgu-sxm_DO8r3cyveGcVdg8g2edjtyJ8UsfHAz3-IMoxoNrgyCZ4_afHrXxQO-n_5fpQO32sNFj8Y0-L_ARdgA_4eWn9EbJ9GRQvXlYgUnsSo0jiycedIuRCIFHw1LodxsK_oHfheySCahvXhAFiHsanjOZ7HF_5egC_D-htYE0VW8bVZfiDS9EmfvIRZGlXw6bThFHRJ34d5EXbGD_qUvkebcEYFWVSt-UhBBYwm8cg0fovoOlhezP3Es-iEYcz_gWVO-S7pZ7YF1S9M5FokTXO2MLq2Lz6z74T6EwsbBz8m0HwAHeJ-AjV2xFzPMNP8WqeL_dVPxUeVRy6xM5cmW0sNbyffTplCtmMmlGXCfj9vGimEgqRUpnoatapa0yZBK05GrYaAjqhkTmsL_jqXb03TcgvYh5zyBvsjYG30zUVFA4F7sE4afV3Y5jUgC7b810neK2o_YDc7sfs6OHz5LzGt-H2iFSc-nJWZkSK7IMVq5sifRp3Nox4EkiS5DvL3I2DpXBdGLDqYAi1T3LdJA_YAuHMg8JH3Uh9ia8PKPIMNvSw2Jxy7wkxW8eoJm3ogSWvBgY2yDXdPWGrn7dA7XRQ-wthg8f23bTJotcUleA9npKCdkk-gXgPIsxSgT7zkGsv12IQ_I-E5oqNrT9EcSnf6WRmWp1gGhylj2gliXgCnwjijx0cnnkvteuIiec_FKS8HElzRipx5E034FVnHzvRhMQ7BTXrZnLNjsDSq3k8dz9ix46TD-vQgFw0iP3tgUnuqb3Exeo5s4UX6swMbp4MCM_zhcHJF5jnmJYH_D7Y2tgkciYTgLhJC3wTE2T3tOWwoZ7CQrupcoLTUIZFSlq5ZRC87j6VHND9xTRsfDmytqaXnpYLwFhgbGQgCm87fmPodVoIXYsNEwQq-gE2aL-lylTNGF8eNICJNaZaxPOdFlvGqTpd5WvOSrvMlrMhCbiihGWGUpGRJU5oILvIlp4wVdS5ExdCSQMOlSpTqm8TYwyKm7yZlNF2TheIlKBcvtJSWvHoCLRDbTlfYwE1E6enGal1Yy_YLuwnn3ZTdwaElUdJ5d0bw0qt4SQ7HPKJsj7Ldl8eHcG28b1preri6FldGwAH0orNqc_S-dSHq9A7Ru4P0x65MKtMgehcgxsdNa81fUHlE76JFDtG70ah-Q_8OAAD__5HKPAM">