[llvm] [InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (PR #87474)

via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 4 21:51:24 PDT 2024


================
@@ -626,6 +626,150 @@ Instruction *InstCombinerImpl::foldPowiReassoc(BinaryOperator &I) {
   return nullptr;
 }
 
+bool isFSqrtDivToFMulLegal(Instruction *X, SmallSetVector<Instruction *, 2> &R1,
+                           SmallSetVector<Instruction *, 2> &R2) {
+
+  BasicBlock *BBx = X->getParent();
+  BasicBlock *BBr1 = R1[0]->getParent();
+  BasicBlock *BBr2 = R2[0]->getParent();
+
+  auto IsStrictFP = [](Instruction *I) {
+    IntrinsicInst *II = dyn_cast<IntrinsicInst>(I);
+    if (II && II->isStrictFP())
+      return true;
+    return false;
+  };
+
+  // check if X and instructions in R1/R2 satisfy basic block constraints
+  auto BBConstraintsSatisfied = [BBx, BBr1, BBr2]() {
+    // div instruction and one of the multiplications must reside in the same
+    // block. If not, the optimized code may execute more ops than before and
+    // this may hamper the performance
+    if (!(BBx == BBr1 || BBx == BBr2))
+      return false;
+    return true;
+  };
+
+  // Check the constaints on instruction X
+  auto XConstraintsSatisfied = [X, &IsStrictFP]() {
+    // X must have 3 uses in R1/R2 inclusive and 1 more use if the replacement
+    // for X should not get dead code eliminated. If X has less than 4 uses, the
----------------
sushgokh wrote:

To clarify,
 x= 1/sqrt(x)
 r1 = x * x
 r2 = a * x

So, 3 uses in r1/r2 both inclusive.
 Now, once I replace r1 and r2 with modified values, there is no use of 'x' and hence, it will get DCE if there is no more use of it and hence, no need of this transformation.
 
Hence, atleast 4 uses of 'x' are required.

WIll try to rephrase it.

https://github.com/llvm/llvm-project/pull/87474


More information about the llvm-commits mailing list