[llvm] [RISCV] Form vredsum from explode_vector + scalar (left) reduce (PR #67821)

Fri Sep 29 08:59:12 PDT 2023

================
@@ -413,47 +159,21 @@ define i32 @reduce_sum_16xi32_prefix4(ptr %p) {
 }
 
 define i32 @reduce_sum_16xi32_prefix5(ptr %p) {
-; RV32-LABEL: reduce_sum_16xi32_prefix5:
-; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV32-NEXT:    vle32.v v8, (a0)
-; RV32-NEXT:    vmv.x.s a0, v8
-; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
-; RV32-NEXT:    vslidedown.vi v10, v8, 1
-; RV32-NEXT:    vmv.x.s a1, v10
-; RV32-NEXT:    vslidedown.vi v10, v8, 2
-; RV32-NEXT:    vmv.x.s a2, v10
-; RV32-NEXT:    vslidedown.vi v10, v8, 3
-; RV32-NEXT:    vmv.x.s a3, v10
-; RV32-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
-; RV32-NEXT:    vslidedown.vi v8, v8, 4
-; RV32-NEXT:    vmv.x.s a4, v8
-; RV32-NEXT:    add a0, a0, a1
-; RV32-NEXT:    add a2, a2, a3
-; RV32-NEXT:    add a0, a0, a2
-; RV32-NEXT:    add a0, a0, a4
-; RV32-NEXT:    ret
-;
-; RV64-LABEL: reduce_sum_16xi32_prefix5:
-; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; RV64-NEXT:    vle32.v v8, (a0)
-; RV64-NEXT:    vmv.x.s a0, v8
-; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
-; RV64-NEXT:    vslidedown.vi v10, v8, 1
-; RV64-NEXT:    vmv.x.s a1, v10
-; RV64-NEXT:    vslidedown.vi v10, v8, 2
-; RV64-NEXT:    vmv.x.s a2, v10
-; RV64-NEXT:    vslidedown.vi v10, v8, 3
-; RV64-NEXT:    vmv.x.s a3, v10
-; RV64-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
-; RV64-NEXT:    vslidedown.vi v8, v8, 4
-; RV64-NEXT:    vmv.x.s a4, v8
-; RV64-NEXT:    add a0, a0, a1
-; RV64-NEXT:    add a2, a2, a3
-; RV64-NEXT:    add a0, a0, a2
-; RV64-NEXT:    addw a0, a0, a4
-; RV64-NEXT:    ret
+; CHECK-LABEL: reduce_sum_16xi32_prefix5:
----------------
preames wrote:

Just to flag - We could definitely do better on the lowering for illegally typed prefix vectors.  I think this makes sense to land as is because the current result is better than the scalar tree.  We could explore using either a select with splat(zero) here, a masked reduce, or a vl toggle for the reduce.  

https://github.com/llvm/llvm-project/pull/67821