[llvm] [InstCombine] Transform high latency, dependent FSQRT/FDIV into FMUL (PR #87474)

Wed Jan 8 11:31:15 PST 2025

================
@@ -666,6 +667,90 @@ Instruction *InstCombinerImpl::foldPowiReassoc(BinaryOperator &I) {
   return nullptr;
 }
 
+// Check legality for transforming
+// x = 1.0/sqrt(a)
+// r1 = x * x;
+// r2 = a/sqrt(a);
----------------
andykaylor wrote:

Excuse me if this was covered in one of the many resolved conversations, but it's not clear to me why you're transforming a/sqrt(a) as part of this pattern. Is it because you need to hoist R2 into the same block as X?

I see that in InstCombinerImpl::foldFMulReassoc() we are transforming a number of patterns into x/sqrt(x) with a comment that the backend is expected to transform that into sqrt(x) if the necessary fast-math flags are present. I'm not sure why that's being left to the backend, but I don't see any reason to perform the transformation here. If you want InstCombine to do that, it could just as easily be an independent transformation.

It may be that this is a case where we decided we needed the "unsafe-fp-math" function attribute because none of the individual fast-math flags clearly allows it. @jcranmer-intel has been working on clarifying the semantics of these flags and might have more to say on this. I think it's definitely a transformation we want to allow, but I would argue that it requires more than just reassoc.

https://github.com/llvm/llvm-project/pull/87474