[llvm] [InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (PR #87474)

Wed Jan 8 11:31:16 PST 2025

================
@@ -626,6 +626,88 @@ Instruction *InstCombinerImpl::foldPowiReassoc(BinaryOperator &I) {
   return nullptr;
 }
 
+// Check legality for transforming
+// x = 1.0/sqrt(a)
+// r1 = x * x;
+// r2 = a/sqrt(a);
+//
+// TO
+//
+// r1 = 1/a
+// r2 = sqrt(a)
+// x = r1 * r2
+static bool isFSqrtDivToFMulLegal(Instruction *X, ArrayRef<Instruction *> R1,
+                                  ArrayRef<Instruction *> R2) {
+  BasicBlock *BBx = X->getParent();
+  BasicBlock *BBr1 = R1[0]->getParent();
+  BasicBlock *BBr2 = R2[0]->getParent();
+
+  CallInst *FSqrt = cast<CallInst>(X->getOperand(1));
+  if (!FSqrt->hasAllowReassoc() || !FSqrt->hasNoNaNs() ||
+      !FSqrt->hasNoSignedZeros() || !FSqrt->hasNoInfs())
----------------
andykaylor wrote:

The nnan restriction severely limits the usefulness of this transformation, so I think it's worth trying to find alternative ways for the transformation to trigger. I understand that you want it to trigger for cases where x, r1, and r2 have an arbitrary number of users, but if the nnan condition isn't met, you could still perform the transformation if r1 is the only user or x. In addition, you could check isKnownNonNegative() for a.

On the other hand, the ninf requirement is similarly restrictive, so maybe it's just necessary to accept that the limitations on this transformation. Since you mentioned CPU2017 in your description, I would mention that ninf can't be used with all CPU2017 benchmarks. In particular, it breaks povray. That's not to say this transformation isn't general enough. I'm just highlighting the limitation.

https://github.com/llvm/llvm-project/pull/87474