[llvm] GlobalISel needs fdiv 1 / sqrt(x) to rsq combine (PR #78673)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 8 01:37:18 PST 2024
================
@@ -334,6 +337,45 @@ bool AMDGPUPostLegalizerCombinerImpl::matchRcpSqrtToRsq(
return false;
}
+bool AMDGPUPostLegalizerCombinerImpl::matchFDivSqrtToRsq(
+ MachineInstr &MI) const {
+ Register Dst = MI.getOperand(0).getReg();
+ Register Sqrt = MI.getOperand(2).getReg();
+ LLT DstTy = MRI.getType(Dst);
+ const MachineFunction &MF = B.getMF();
+ bool AllowInaccurateRsq =
+ MI.getFlag(MachineInstr::FmAfn) || MF.getTarget().Options.UnsafeFPMath;
+ if (!MRI.hasOneUse(Sqrt)) {
+ return false;
+ }
+ // v_rcp_f32 and v_rsq_f32 do not support denormals, and according to
+ // the CI documentation has a worst case error of 1 ulp.
+ // OpenCL requires <= 2.5 ulp for 1.0 / x, so it should always be OK to
+ // use it as long as we aren't trying to use denormals.
+ //
+ // v_rcp_f16 and v_rsq_f16 DO support denormals and 0.51ulp.
+ if (!AllowInaccurateRsq && DstTy != LLT::scalar(16)) {
+ return false;
+ }
+ // What about v_rsq_f64? - Is UnsafeFPMath sufficient to do this for f64? The
+ // maximum ULP error seems really high at 2^29 ULP.
+ return true;
----------------
arsenm wrote:
Don't allow f64. We've never done it, and I believe we need should add extra correction code when we do use it. It's been on my todo list for a long time to try to make use of it. Currently the library code has an expansion we should move into the compiler
https://github.com/llvm/llvm-project/pull/78673
More information about the llvm-commits
mailing list