<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/132180>132180</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [RISCV][EVL] Improve AnyOf reduction codegen
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:RISC-V,
            vectorizers
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          lukel97
      </td>
    </tr>
</table>

<pre>
    On RISC-V, an AnyOf reduction is vectorized quite poorly with EVL tail folding:

```c
int f(int *x, int y, int n) {
  int z = 0;
  for (int i = 0; i < n; i++)
    if (x[i] == y)
      z = 1;
  return z;
}
```

```asm
.LBB0_2:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
        sub     t0, a2, a3
        sh2add  a6, a3, a0
        vsetvli t1, t0, e8, mf2, ta, ma
        vsetvli a4, zero, e64, m4, ta, ma
        vmv.v.x v16, t1
        vmsleu.vv       v9, v16, v12
        vsetvli zero, t0, e32, m2, ta, ma
        vle32.v v10, (a6)
        sub     a5, a5, a7
        vsetvli a4, zero, e64, m4, ta, ma
        vmsltu.vx       v16, v12, t1
        vmand.mm        v9, v8, v9
        vsetvli zero, zero, e32, m2, ta, ma
        vmseq.vx        v17, v10, a1
        vmor.mm v8, v8, v17
        vmand.mm        v8, v8, v16
        vmor.mm v8, v8, v9
        add     a3, a3, t1
        bnez    a5, .LBB0_2
# %bb.3:                                # %middle.block
        vcpop.m a0, v8
        snez    a0, a0
        ret
```

Compare this to the non-EVL tail folded version:

```asm
.LBB0_5: # %vector.body
                                        # =>This Inner Loop Header: Depth=1
        vl2re32.v       v10, (a3)
        add     a3, a3, a4
        vsetvli zero, zero, e32, m2, ta, ma
        vmseq.vx        v9, v10, a1
        vmor.mm v8, v8, v9
        bne     a3, a5, .LBB0_5
# %bb.6: # %middle.block
        vcpop.m a3, v8
        # ...
```

The issue is due to the i1 vp.merge used having poor lowering, in turn due to the lack of tail preserving mask instructions in the RISC-V ISA.

A better lowering is likely to widen this to an i8 vector:

```asm
        vsetvli a5, zero, e8, m1, ta, ma
        vmv.v.i v9, 0
loop:
        vsetvli a5, a7, e32, m1, ta, ma
        vle32.v v8, (a0)
        add     a0, a0, a5
        vmseq.vx        v0, v8, zero
        vsetvli zero, zero, e8, mf4, ta, ma
        vmerge.vim      v10, v9, 1, v0
        vor.vv  v11, v11, v10
        sub     a7, a7, a5
        bnez    a7, loop
exit:
        vmsne.vi        v10, v11, 0
        vcpop.m a1, v10
```

Performing this as an in-loop reduction was also discussed in #131830, but it's likely not as profitable as the widened out-of-loop reduction. 
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy8Vl1v4ygX_jXkBtXCEMfxRS6SZqq3UqV5tTPq7Qqb44QtBg9g9-PXr8B24jTtzEi72ijCGHx4zsdzzoE7Jw8aYIOyHcr2C975o7Eb1T2BKvJFacTr5qvGf9x_u715RPQWc423-vVrjS2IrvLSaCwd7qHyxso3EPhHJz3g1hirXvGz9Ef85fEBey4Vro0SUh8Q2yIS_ysy_CtEtlJ7XCO6Dk9Ety8BLMxfp4lGtMAo3yGyxXHhDSO2xwSxYak2Fo_y8rQTp7dYxymiu_gv4vcYyzoIvKBsJ1G2DzJB7PX8AR4x0gnDgu-sxm_DO8r3cyveGcVdg8g2edjtyJ8UsfHAz3-IMoxoNrgyCZ4_afHrXxQO-n_5fpQO32sNFj8Y0-L_ARdgA_4eWn9EbJ9GRQvXlYgUnsSo0jiycedIuRCIFHw1LodxsK_oHfheySCahvXhAFiHsanjOZ7HF_5egC_D-htYE0VW8bVZfiDS9EmfvIRZGlXw6bThFHRJ34d5EXbGD_qUvkebcEYFWVSt-UhBBYwm8cg0fovoOlhezP3Es-iEYcz_gWVO-S7pZ7YF1S9M5FokTXO2MLq2Lz6z74T6EwsbBz8m0HwAHeJ-AjV2xFzPMNP8WqeL_dVPxUeVRy6xM5cmW0sNbyffTplCtmMmlGXCfj9vGimEgqRUpnoatapa0yZBK05GrYaAjqhkTmsL_jqXb03TcgvYh5zyBvsjYG30zUVFA4F7sE4afV3Y5jUgC7b810neK2o_YDc7sfs6OHz5LzGt-H2iFSc-nJWZkSK7IMVq5sifRp3Nox4EkiS5DvL3I2DpXBdGLDqYAi1T3LdJA_YAuHMg8JH3Uh9ia8PKPIMNvSw2Jxy7wkxW8eoJm3ogSWvBgY2yDXdPWGrn7dA7XRQ-wthg8f23bTJotcUleA9npKCdkk-gXgPIsxSgT7zkGsv12IQ_I-E5oqNrT9EcSnf6WRmWp1gGhylj2gliXgCnwjijx0cnnkvteuIiec_FKS8HElzRipx5E034FVnHzvRhMQ7BTXrZnLNjsDSq3k8dz9ix46TD-vQgFw0iP3tgUnuqb3Exeo5s4UX6swMbp4MCM_zhcHJF5jnmJYH_D7Y2tgkciYTgLhJC3wTE2T3tOWwoZ7CQrupcoLTUIZFSlq5ZRC87j6VHND9xTRsfDmytqaXnpYLwFhgbGQgCm87fmPodVoIXYsNEwQq-gE2aL-lylTNGF8eNICJNaZaxPOdFlvGqTpd5WvOSrvMlrMhCbiihGWGUpGRJU5oILvIlp4wVdS5ExdCSQMOlSpTqm8TYwyKm7yZlNF2TheIlKBcvtJSWvHoCLRDbTlfYwE1E6enGal1Yy_YLuwnn3ZTdwaElUdJ5d0bw0qt4SQ7HPKJsj7Ldl8eHcG28b1preri6FldGwAH0orNqc_S-dSHq9A7Ru4P0x65MKtMgehcgxsdNa81fUHlE76JFDtG70ah-Q_8OAAD__5HKPAM">