[llvm] [InstCombine] Transform (fcmp + fadd + sel) into (fcmp + sel + fadd) (PR #106492)

Tue Sep 3 23:21:12 PDT 2024

================
@@ -3668,6 +3668,51 @@ static bool hasAffectedValue(Value *V, SmallPtrSetImpl<Value *> &Affected,
   return false;
 }
 
+static Value *foldSelectAddConstant(SelectInst &SI,
+                                    InstCombiner::BuilderTy &Builder) {
+  Value *Cmp;
+  Instruction *FAdd;
+  ConstantFP *C;
+
+  // select((fcmp OGT/OLT, X, 0), (fadd X, C), C) => fadd((select (fcmp OGT/OLT,
+  // X, 0), X, 0), C)
+
+  // This transformation enables the possibility of transforming fcmp + sel into
+  // a fmax/fmin.
+
+  // OneUse check for `Cmp` is necessary because it makes sure that other
+  // InstCombine folds don't undo this transformation and cause an infinite
+  // loop.
+  if (match(&SI, m_Select(m_OneUse(m_Value(Cmp)), m_OneUse(m_Instruction(FAdd)),
+                          m_ConstantFP(C))) ||
+      match(&SI, m_Select(m_OneUse(m_Value(Cmp)), m_ConstantFP(C),
+                          m_OneUse(m_Instruction(FAdd))))) {
+    Value *X;
+    CmpInst::Predicate Pred;
+    if (!match(Cmp, m_FCmp(Pred, m_Value(X), m_AnyZeroFP())))
+      return nullptr;
+
+    if (Pred != CmpInst::FCMP_OGT && Pred != CmpInst::FCMP_OLT)
----------------
rajatbajpai wrote:

> But even so I think this is a useful transform even if it doesn't result in a min/max instruction (it will with additional nnan context)

It seems we already have this transformation for the `Value` type but it doesn't work on the `Constant` type. https://godbolt.org/z/bWrozq6o1

This is partly why I was only looking at the `Constant`.

> You cannot simply turn these into min/max instructions, it's more complicated than that. For the minnum/maxnum-like cases, you need to know non-nan (or more specifically, not signaling nan). On old AMDGPU targets we did have instructions that followed the fcmp+select pattern, but they were unfortunately removed. You do see them used here: https://godbolt.org/z/efY38qe18

I see, but why `test_ogt` uses `v_max_f32_e32` while `test_oge` uses `v_max_legacy_f32_e64`?

https://github.com/llvm/llvm-project/pull/106492