[llvm] [LV] Vectorize FMax w/o fast-math flags. (PR #146711)

Sun Jul 13 04:17:51 PDT 2025

================
@@ -47,6 +47,9 @@ enum class RecurKind {
   FMul,     ///< Product of floats.
   FMin,     ///< FP min implemented in terms of select(cmp()).
   FMax,     ///< FP max implemented in terms of select(cmp()).
+  FCmpOGTSelect, ///< FP max implemented in terms of select(cmp()), but without
+                 /// any fast-math flags. Users need to handle NaNs and signed
+                 /// zeros when generating code.
----------------
fhahn wrote:




> Pattern and how ("users need") to handle it should indeed be explained, but preferably elsewhere. Pattern seems to suggest how to handle FP reductions (min, max, possibly others as well?) in the presence of NaNs and/or signed zeroes (both equally challenging?), which is evaded in the presence of certain fast-math flags (namely absence of nans and signed zeroes?).
> 

Explanation is currently interleaved in `handleFMaxReductionsWithoutFastMath`, should it go elsewhere?

> Does the following sound right: a. If the set is NaN-free, its reduction result is as with the fast-math flag. 
if it is also free of signed zeroes, yep

> b. If the set contains only NaN's, its reduction is either NaN or the initial value, depending on the reduction operation being unordered or ordered, respectively. 

Yep, update to require ordered predicates, so it would be the start value if all-NaNs.

> c. If the set contains both NaN's and non-NaN's, its reduction is either NaN or the reduction of all non-NaN's, depending on the reduction operation being unordered or ordered, respectively.
> 

Yep, restricted to just ordered for now.

> The vector of partial subset reduction results of case (a) contain only non NaN's, and is subject to standard final reduction. In case (b), this vector holds only NaN's or only the initial value, which provides the respective final value. Case (c) requires "tie breaking" based on index? What if the initial value is NaN?

Yep for cases a) and b). 

For  case c), if there is any non-NaN value (either start or any value in the loop), the reduction result is non-NaN. If any lane is non-NaN in the partial reduction vector, it will get selected.

The tie-breaking is mainly needed for signed zeros, where we need to pick the first one. Without tie-breaking, horizontal fmax will return +0.0 if it contains both -0.0 and +0.0, but if -0.0 has been seen first it needs to be selected first according to the index.


https://github.com/llvm/llvm-project/pull/146711