<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/55447>55447</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [RISCV] Poor vector codegen for integer max reduction with Zvl512b
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          preames
      </td>
    </tr>
</table>

<pre>
    https://godbolt.org/z/8vj83rj14

Here's a slightly simplified reproducer:

```
$ cat vector_max_reduce.c 
int vector_max_reduce_i32(int* a, unsigned a_len) {
  int max = -987654321;
  for (unsigned i = 0; i < a_len; i++)
    max = (a[i] > max) ? a[i] : max;
  return max;
}
```

$ clang -S vector_max_reduce.c --target=riscv64 -mllvm -riscv-v-vector-bits-min=512 -Xclang -target-feature -Xclang +v,+f,+m,+d,+zba -O2 -emit-llvm

Key observations here:

1. We only get the double vector loop structure with Zvl512 and above.  Below that, we generate a much more reasonable single vector loop and scalar epilogue.  The opcode (max vs add) does not seem to influence this choice.
2. Even if we generate the vector epilogue, we don't need to do a vector to scale reduction in between.  We do need to do a partial reduction (in this case from 16 lanes to 8 lanes), but we'd be better off deferring the final scalar reduction until the end.  
3. In the final assembly (key block copied below), we seem to have folded the scalar reduction step back into the main vector body.  Note that this is not the IR form!

```
        addi    a4, a5, 64
        vle32.v v10, (a5)
        vle32.v v11, (a4)
        vmax.vv v8, v10, v8
        vmax.vv v9, v11, v9
        addi    a1, a1, -32
        addi    a5, a5, 128
        bnez    a1, .LBB0_7
        vmax.vv v8, v8, v9
        lui     a1, 524288
        vmv.s.x v9, a1
        vredmax.vs      v8, v8, v9
        vmv.x.s a1, v8
        beq     a3, a2, .LBB0_13
        andi    a4, a2, 24
        beqz    a4, .LBB0_14

```

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx9Vm1zozYQ_jX4yw4MCLDNB3-Ik8s0007buev0Ov2SEbDYygnJlQR5-fVdCWKcXO4mBCFp99lnX7Ryrdvn3dG5k43yq4jd0nPQba2lS7Q50OyF_rfjwzY3D1kRpTdRejW9f0GDEdtY4GClOBydfAYr-pMUncAWDJ6MbocGjQe-0IvW6fxMU1ZAwx2M2Dht7nv-dG_Q6yUNTBJCfbB7L3IWsS3tRewKeMSuYVBWHBSZ5vcSVcQqiDb7CQLAg5A2RPkNxNV2sy6LnGVRfhbotAECPIOIIJqSRPi8nlH9NGL78FSvunCGJgQelXsRlfSdf_LrgUh-CxfrV2F9sW3QDUa9WYw2Nx-Hawma5OoA8ZcPIxfHjpsDOuJkhG3GdQFxL-XYQxzmMf0FtbgWzsa9IM9uyoxB_M-MO-nHHXLihud1cnykYNPQTUM_De00vNQc4j8IBnvhYm_wkvav-Ay6tmhG7oRWFo6-ht6WR5bAVwStqJrIPrgjQquHWuLsJ0itT2CdGZpA7FG4I_w7Sk-eK0p-rUdMAPYo9SOpc-dr4xEJTaHhDqle-6E5Qq9J2yC3WnEPb4U6vLPi8WzDJTeAJyH1YfDIfxElfWp0iz7dPvMjHYK29YluNVpQ2oFF7MFpqrtODqgaJCrCQnPUghI0ucoS-DSiAtG94eddnlm8Wp1daDWV9caBQipQAm81OTOL0tRT9S5RDfjwkm2o0T0iKmL91au_1Txx4wSXFxrhRM1MuUXojO4hWwOlnvwive306WufKNWDI1pEqSVD3pZDA7rroMUOjaGABmc6ocjKHMjF2KCckEEAVUsMp6DkCdypCzVuLfY1VQNx-0b1U0vdfINGn3yTqX2SZzIUn9egH_lI6lq23llC-s60dXiCmhMQ9QUdZHpOns-xrKkpEqHfdcgGd1NExJRZL3z32fcLKv3sZ50tragqhB8KT5CX_r0-N9FqlJizZPRfWer3fPcoz33lnUT2KlG8kaD6S8YgsfUCMxJNPpCoJomANFbfsQzr0zum7vp-u1ycyNiCXyt8OWsnv-336f3mx_S272zLYTFdsoJtL4mPiU2ezsR5tmxRKgOy_TGyV39K7Bn9IiQ1_ueX84DKFt5ZvjitLjMXZFhxCfBy3p11i5-Uwqrd5W2VV3zlhJO4o6vg892X67_9dfCnpoqbC883FeoD4TKiyqRvE66WpXIv2l29Gox8f3fT9lAnjabavA3tdxpiuowfyAZNhbWDP8G3ZVkUm9Vxty6qirOySzusqq7gnKdt1qRtmVX5ps3SleR0zqwnHTGm8BECBH0T_ZXYsZSxtMxyltKNuknWLM-bMl8X2yLtWNNGRYp0uGTiefgfFSuzC5Tq4WBpUwrr7LJJ591fwCFGHp8P7qjN7kSduke7CqZ3gfr_W5GfAQ">