[PATCH] D87391: [Intrinsics] define semantics for experimental fmax/fmin vector reductions
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 9 10:40:59 PDT 2020
dmgreen added inline comments.
================
Comment at: llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll:247
+; CHECK-FP-NEXT: vminnm.f32 s0, s0, s1
+; CHECK-FP-NEXT: vminnm.f32 s0, s0, s4
; CHECK-FP-NEXT: bx lr
----------------
spatel wrote:
> nikic wrote:
> > This lowering looks incorrect for the case where both elements are NaN. We'll fold to +INF then. We probably have an expansion that assumes +INF is a neutral element for fminnum, but it isn't in the presence of NaNs :/
> Good catch - something in vector legalization does that:
> t15: v4f32 = insert_vector_elt t12, ConstantFP:f32<INF>, Constant:i32<2>
> t17: v4f32 = insert_vector_elt t15, ConstantFP:f32<INF>, Constant:i32<3>
> t18: f32 = vecreduce_fmin t17
>
I originally thought this was because we don't go through ExpandReductions, widening them in ISel instead. They do look like they get padded with +/- Inf in that case.
But we do expand pre-isel if NoNan isn't present in shouldExpandReduction. I looks like some of the expansion of min/max is unconditionally setting fast flags in llvm::createMinMaxOp. Unless I'm mistaken.
The padding with +/- inf is likely a problem on it's own right too.
================
Comment at: llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll:300
; CHECK-NOFP: @ %bb.0: @ %entry
-; CHECK-NOFP-NEXT: vcmp.f32 s7, s3
-; CHECK-NOFP-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NOFP-NEXT: vcmp.f32 s5, s1
-; CHECK-NOFP-NEXT: vselgt.f32 s8, s3, s7
-; CHECK-NOFP-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NOFP-NEXT: vcmp.f32 s6, s2
-; CHECK-NOFP-NEXT: vselgt.f32 s10, s1, s5
-; CHECK-NOFP-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NOFP-NEXT: vcmp.f32 s4, s0
-; CHECK-NOFP-NEXT: vselgt.f32 s12, s2, s6
-; CHECK-NOFP-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NOFP-NEXT: vcmp.f32 s8, s10
-; CHECK-NOFP-NEXT: vselgt.f32 s0, s0, s4
-; CHECK-NOFP-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NOFP-NEXT: vcmp.f32 s12, s0
-; CHECK-NOFP-NEXT: vselgt.f32 s2, s10, s8
-; CHECK-NOFP-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NOFP-NEXT: vselgt.f32 s0, s0, s12
-; CHECK-NOFP-NEXT: vcmp.f32 s2, s0
-; CHECK-NOFP-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NOFP-NEXT: vselgt.f32 s0, s0, s2
+; CHECK-NOFP-NEXT: vminnm.f32 s10, s0, s4
+; CHECK-NOFP-NEXT: vminnm.f32 s8, s1, s5
----------------
It seems like some of these are _better_ than the fast math versions! :)
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D87391/new/
https://reviews.llvm.org/D87391
More information about the llvm-commits
mailing list