[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 20 19:53:43 PST 2020
arsenm added a comment.
The GlobalISel path should also be fixed, but that can be a follow up patch
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:585
+// reduce precision. Perform RCP optimization to insert intrinsic whenever
+// it is suitable to do so.
bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
----------------
Needs comment explaining the interaction between !fpmath requirements and denormals. This could use a chart of different fast math options, FP math and denormal handling and the expected lowering
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:603
bool UnsafeDiv = HasUnsafeFPMath || FMF.isFast() ||
FMF.allowReciprocal();
// With UnsafeDiv node will be optimized to just rcp and mul.
----------------
I don't think just allow reciprocal is sufficient without either checking FPMath or afn. I think this needs to be something more like
UnsafeFP || isFast || (allowReciprocal && (denormal hasLowAccuracy || approximateFunction))
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:612-614
+ float ULP = FPOp->getFPAccuracy();
+ if (ULP < 2.5f)
+ DoFDivFast = false;
----------------
It would be clearer to do something like
bool NeedHighAccuracy = !FPMath || FPMath->getFPAccuracy() < 2.5
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:617
+ // Do not translate to rcp intrinsic for f32 when there is denorm support
+ // or fpmath metadat is unavailable.
+ bool NoRCPIntrinsic = !UnsafeDiv &&
----------------
Typo metadat
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:618
+ // or fpmath metadat is unavailable.
+ bool NoRCPIntrinsic = !UnsafeDiv &&
+ (Ty->isFloatTy() && (HasFP32Denormals || !FPMath));
----------------
It would be clearer to invert this, instead of the logic below relying on the double negative
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:619
+ bool NoRCPIntrinsic = !UnsafeDiv &&
+ (Ty->isFloatTy() && (HasFP32Denormals || !FPMath));
+
----------------
FPMath should be checked once, and in relation to it's value only. Checking for the lack of metadata here is imprecise
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7540
const SDNodeFlags Flags = Op->getFlags();
bool Unsafe = DAG.getTarget().Options.UnsafeFPMath || Flags.hasAllowReciprocal();
----------------
Based on the original problem, Flags.hasAllowReciprocal() isn't sufficient here. Without knowledge of !fpmath, this also needs approximate function
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7542
- if (!Unsafe && VT == MVT::f32 && hasFP32Denormals(DAG.getMachineFunction()))
+ // Do rcp optimization only under "Unsafe" math here.
+ // NOTE: We already performed RCP optimization to insert intrinsics in
----------------
Needs comment explaining why
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:8726
+ if ((VT == MVT::f32 || VT == MVT::f16) && N0.getOpcode() == ISD::FSQRT)
+ return DCI.DAG.getNode(AMDGPUISD::RSQ, SDLoc(N), VT,
----------------
Braces
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71293/new/
https://reviews.llvm.org/D71293
More information about the llvm-commits
mailing list