[llvm] [InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (PR #87474)

Tue Apr 9 00:53:24 PDT 2024

================
@@ -705,19 +828,20 @@ Instruction *InstCombinerImpl::foldFMulReassoc(BinaryOperator &I) {
   // has the necessary (reassoc) fast-math-flags.
   if (I.hasNoSignedZeros() &&
       match(Op0, (m_FDiv(m_SpecificFP(1.0), m_Value(Y)))) &&
-      match(Y, m_Sqrt(m_Value(X))) && Op1 == X)
+      match(Y, m_Sqrt(m_Value(X))) && Op1 == X && !delayFMulSqrtTransform(Op0))
----------------
nikic wrote:

Why are these necessary? I'd expect the fdiv to get visited before the fmuls. Does your motivating case require them?

Also note that this kind of check is fundamentally unreliable due to multiple InstCombine runs. If the first InstCombine run does this transform and the second one would potentially be able to do the sqrt/div transform, then this check won't help.

https://github.com/llvm/llvm-project/pull/87474