[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set

Wed Jan 22 07:18:47 PST 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:600
+//
+// We can insert amdgcn_fdiv_fast under !UnsafeDiv and !NeedHighAccuracy.
 bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
----------------
UnsafeDiv is too imprecise here. This should explain in concrete terms why we need to insert the intrinsics and not just refer to the variable names. We need fdiv.fast when we only need 2.5 ULP and denormals are flushed

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:624
   FastMathFlags FMF = FPOp->getFastMathFlags();
   bool UnsafeDiv = HasUnsafeFPMath || FMF.isFast() ||
+                (FMF.allowReciprocal() &&
----------------
I think this should maybe be rephrased into RcpLegal and UseFDivFast

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71293/new/

https://reviews.llvm.org/D71293