[llvm] [RISCV] Generaize reduction tree matching to fp sum reductions (PR #68599)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 9 11:31:22 PDT 2023
================
@@ -764,6 +764,165 @@ define i32 @reduce_umin_16xi32_prefix5(ptr %p) {
%umin3 = call i32 @llvm.umin.i32(i32 %umin2, i32 %e4)
ret i32 %umin3
}
+
+define float @reduce_fadd_16xf32_prefix2(ptr %p) {
+; CHECK-LABEL: reduce_fadd_16xf32_prefix2:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT: vle32.v v8, (a0)
+; CHECK-NEXT: vmv.s.x v9, zero
+; CHECK-NEXT: vfredusum.vs v8, v8, v9
+; CHECK-NEXT: vfmv.f.s fa0, v8
+; CHECK-NEXT: ret
+ %v = load <16 x float>, ptr %p, align 256
+ %e0 = extractelement <16 x float> %v, i32 0
+ %e1 = extractelement <16 x float> %v, i32 1
+ %fadd0 = fadd fast float %e0, %e1
+ ret float %fadd0
+}
+
+define float @reduce_fadd_16xi32_prefix5(ptr %p) {
+; CHECK-LABEL: reduce_fadd_16xi32_prefix5:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a1, 524288
+; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT: vle32.v v8, (a0)
+; CHECK-NEXT: vmv.s.x v10, a1
+; CHECK-NEXT: vsetivli zero, 6, e32, m2, tu, ma
+; CHECK-NEXT: vslideup.vi v8, v10, 5
+; CHECK-NEXT: vsetivli zero, 7, e32, m2, tu, ma
+; CHECK-NEXT: vslideup.vi v8, v10, 6
+; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT: vslideup.vi v8, v10, 7
+; CHECK-NEXT: vfredusum.vs v8, v8, v10
----------------
preames wrote:
There's a bunch of possible improvements here. Probably the best is to use a masked reduction.
TBH, the prefix cases aren't showing up my motivating workloads (spec2017), so I'm not super worried about them. I wanted something correct and not terrible so that the incremental approach was viable, but that's currently about as far as I care.
https://github.com/llvm/llvm-project/pull/68599
More information about the llvm-commits
mailing list