[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 22 07:18:47 PST 2020


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:600
+//
+// We can insert amdgcn_fdiv_fast under !UnsafeDiv and !NeedHighAccuracy.
 bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
----------------
UnsafeDiv is too imprecise here. This should explain in concrete terms why we need to insert the intrinsics and not just refer to the variable names. We need fdiv.fast when we only need 2.5 ULP and denormals are flushed


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:624
   FastMathFlags FMF = FPOp->getFastMathFlags();
   bool UnsafeDiv = HasUnsafeFPMath || FMF.isFast() ||
+                (FMF.allowReciprocal() &&
----------------
I think this should maybe be rephrased into RcpLegal and UseFDivFast


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71293/new/

https://reviews.llvm.org/D71293





More information about the llvm-commits mailing list