[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 22 07:18:47 PST 2020
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:600
+//
+// We can insert amdgcn_fdiv_fast under !UnsafeDiv and !NeedHighAccuracy.
bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
----------------
UnsafeDiv is too imprecise here. This should explain in concrete terms why we need to insert the intrinsics and not just refer to the variable names. We need fdiv.fast when we only need 2.5 ULP and denormals are flushed
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:624
FastMathFlags FMF = FPOp->getFastMathFlags();
bool UnsafeDiv = HasUnsafeFPMath || FMF.isFast() ||
+ (FMF.allowReciprocal() &&
----------------
I think this should maybe be rephrased into RcpLegal and UseFDivFast
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71293/new/
https://reviews.llvm.org/D71293
More information about the llvm-commits
mailing list