<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/55447>55447</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISCV] Poor vector codegen for integer max reduction with Zvl512b
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
preames
</td>
</tr>
</table>
<pre>
https://godbolt.org/z/8vj83rj14
Here's a slightly simplified reproducer:
```
$ cat vector_max_reduce.c
int vector_max_reduce_i32(int* a, unsigned a_len) {
int max = -987654321;
for (unsigned i = 0; i < a_len; i++)
max = (a[i] > max) ? a[i] : max;
return max;
}
```
$ clang -S vector_max_reduce.c --target=riscv64 -mllvm -riscv-v-vector-bits-min=512 -Xclang -target-feature -Xclang +v,+f,+m,+d,+zba -O2 -emit-llvm
Key observations here:
1. We only get the double vector loop structure with Zvl512 and above. Below that, we generate a much more reasonable single vector loop and scalar epilogue. The opcode (max vs add) does not seem to influence this choice.
2. Even if we generate the vector epilogue, we don't need to do a vector to scale reduction in between. We do need to do a partial reduction (in this case from 16 lanes to 8 lanes), but we'd be better off deferring the final scalar reduction until the end.
3. In the final assembly (key block copied below), we seem to have folded the scalar reduction step back into the main vector body. Note that this is not the IR form!
```
addi a4, a5, 64
vle32.v v10, (a5)
vle32.v v11, (a4)
vmax.vv v8, v10, v8
vmax.vv v9, v11, v9
addi a1, a1, -32
addi a5, a5, 128
bnez a1, .LBB0_7
vmax.vv v8, v8, v9
lui a1, 524288
vmv.s.x v9, a1
vredmax.vs v8, v8, v9
vmv.x.s a1, v8
beq a3, a2, .LBB0_13
andi a4, a2, 24
beqz a4, .LBB0_14
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx9Vm1zozYQ_jX4yw4MCLDNB3-Ik8s0007buev0Ov2SEbDYygnJlQR5-fVdCWKcXO4mBCFp99lnX7Ryrdvn3dG5k43yq4jd0nPQba2lS7Q50OyF_rfjwzY3D1kRpTdRejW9f0GDEdtY4GClOBydfAYr-pMUncAWDJ6MbocGjQe-0IvW6fxMU1ZAwx2M2Dht7nv-dG_Q6yUNTBJCfbB7L3IWsS3tRewKeMSuYVBWHBSZ5vcSVcQqiDb7CQLAg5A2RPkNxNV2sy6LnGVRfhbotAECPIOIIJqSRPi8nlH9NGL78FSvunCGJgQelXsRlfSdf_LrgUh-CxfrV2F9sW3QDUa9WYw2Nx-Hawma5OoA8ZcPIxfHjpsDOuJkhG3GdQFxL-XYQxzmMf0FtbgWzsa9IM9uyoxB_M-MO-nHHXLihud1cnykYNPQTUM_De00vNQc4j8IBnvhYm_wkvav-Ay6tmhG7oRWFo6-ht6WR5bAVwStqJrIPrgjQquHWuLsJ0itT2CdGZpA7FG4I_w7Sk-eK0p-rUdMAPYo9SOpc-dr4xEJTaHhDqle-6E5Qq9J2yC3WnEPb4U6vLPi8WzDJTeAJyH1YfDIfxElfWp0iz7dPvMjHYK29YluNVpQ2oFF7MFpqrtODqgaJCrCQnPUghI0ucoS-DSiAtG94eddnlm8Wp1daDWV9caBQipQAm81OTOL0tRT9S5RDfjwkm2o0T0iKmL91au_1Txx4wSXFxrhRM1MuUXojO4hWwOlnvwive306WufKNWDI1pEqSVD3pZDA7rroMUOjaGABmc6ocjKHMjF2KCckEEAVUsMp6DkCdypCzVuLfY1VQNx-0b1U0vdfINGn3yTqX2SZzIUn9egH_lI6lq23llC-s60dXiCmhMQ9QUdZHpOns-xrKkpEqHfdcgGd1NExJRZL3z32fcLKv3sZ50tragqhB8KT5CX_r0-N9FqlJizZPRfWer3fPcoz33lnUT2KlG8kaD6S8YgsfUCMxJNPpCoJomANFbfsQzr0zum7vp-u1ycyNiCXyt8OWsnv-336f3mx_S272zLYTFdsoJtL4mPiU2ezsR5tmxRKgOy_TGyV39K7Bn9IiQ1_ueX84DKFt5ZvjitLjMXZFhxCfBy3p11i5-Uwqrd5W2VV3zlhJO4o6vg892X67_9dfCnpoqbC883FeoD4TKiyqRvE66WpXIv2l29Gox8f3fT9lAnjabavA3tdxpiuowfyAZNhbWDP8G3ZVkUm9Vxty6qirOySzusqq7gnKdt1qRtmVX5ps3SleR0zqwnHTGm8BECBH0T_ZXYsZSxtMxyltKNuknWLM-bMl8X2yLtWNNGRYp0uGTiefgfFSuzC5Tq4WBpUwrr7LJJ591fwCFGHp8P7qjN7kSduke7CqZ3gfr_W5GfAQ">