[llvm] GlobalISel needs fdiv 1 / sqrt(x) to rsq combine (PR #78673)

Thu Feb 8 01:37:18 PST 2024

================
@@ -334,6 +337,45 @@ bool AMDGPUPostLegalizerCombinerImpl::matchRcpSqrtToRsq(
   return false;
 }
 
+bool AMDGPUPostLegalizerCombinerImpl::matchFDivSqrtToRsq(
+    MachineInstr &MI) const {
+  Register Dst = MI.getOperand(0).getReg();
+  Register Sqrt = MI.getOperand(2).getReg();
+  LLT DstTy = MRI.getType(Dst);
+  const MachineFunction &MF = B.getMF();
+  bool AllowInaccurateRsq =
+      MI.getFlag(MachineInstr::FmAfn) || MF.getTarget().Options.UnsafeFPMath;
+  if (!MRI.hasOneUse(Sqrt)) {
+    return false;
+  }
+  // v_rcp_f32 and v_rsq_f32 do not support denormals, and according to
+  // the CI documentation has a worst case error of 1 ulp.
+  // OpenCL requires <= 2.5 ulp for 1.0 / x, so it should always be OK to
+  // use it as long as we aren't trying to use denormals.
+  //
+  // v_rcp_f16 and v_rsq_f16 DO support denormals and 0.51ulp.
+  if (!AllowInaccurateRsq && DstTy != LLT::scalar(16)) {
+    return false;
+  }
+  // What about v_rsq_f64? - Is UnsafeFPMath sufficient to do this for f64? The
+  // maximum ULP error seems really high at 2^29 ULP.
+  return true;
----------------
arsenm wrote:

Don't allow f64. We've never done it, and I believe we need should add extra correction code when we do use it. It's been on my todo list for a long time to try to make use of it. Currently the library code has an expansion we should move into the compiler 

https://github.com/llvm/llvm-project/pull/78673