[llvm] [RISCV] Generaize reduction tree matching to fp sum reductions (PR #68599)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 9 10:56:45 PDT 2023
================
@@ -764,6 +764,165 @@ define i32 @reduce_umin_16xi32_prefix5(ptr %p) {
%umin3 = call i32 @llvm.umin.i32(i32 %umin2, i32 %e4)
ret i32 %umin3
}
+
+define float @reduce_fadd_16xf32_prefix2(ptr %p) {
+; CHECK-LABEL: reduce_fadd_16xf32_prefix2:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT: vle32.v v8, (a0)
+; CHECK-NEXT: vmv.s.x v9, zero
+; CHECK-NEXT: vfredusum.vs v8, v8, v9
+; CHECK-NEXT: vfmv.f.s fa0, v8
+; CHECK-NEXT: ret
+ %v = load <16 x float>, ptr %p, align 256
+ %e0 = extractelement <16 x float> %v, i32 0
+ %e1 = extractelement <16 x float> %v, i32 1
+ %fadd0 = fadd fast float %e0, %e1
+ ret float %fadd0
+}
+
+define float @reduce_fadd_16xi32_prefix5(ptr %p) {
+; CHECK-LABEL: reduce_fadd_16xi32_prefix5:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a1, 524288
+; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT: vle32.v v8, (a0)
+; CHECK-NEXT: vmv.s.x v10, a1
+; CHECK-NEXT: vsetivli zero, 6, e32, m2, tu, ma
+; CHECK-NEXT: vslideup.vi v8, v10, 5
+; CHECK-NEXT: vsetivli zero, 7, e32, m2, tu, ma
+; CHECK-NEXT: vslideup.vi v8, v10, 6
+; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT: vslideup.vi v8, v10, 7
+; CHECK-NEXT: vfredusum.vs v8, v8, v10
----------------
lukel97 wrote:
I wonder if we could improve the legalisation here by using VP nodes (something like https://reviews.llvm.org/D148523 but for reductions) to avoid having to pad out the vector with zeroes. Or it looks like there could also be a combine to replace all these inserts with a splat and single slide.
https://github.com/llvm/llvm-project/pull/68599
More information about the llvm-commits
mailing list