[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 20 19:53:43 PST 2020


arsenm added a comment.

The GlobalISel path should also be fixed, but that can be a follow up patch



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:585
+// reduce precision. Perform RCP optimization to insert intrinsic whenever
+// it is suitable to do so.
 bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
----------------
Needs comment explaining the interaction between !fpmath requirements and denormals. This could use a chart of different fast math options, FP math and denormal handling and the expected lowering


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:603
   bool UnsafeDiv = HasUnsafeFPMath || FMF.isFast() ||
                                       FMF.allowReciprocal();
   // With UnsafeDiv node will be optimized to just rcp and mul.
----------------
I don't think just allow reciprocal is sufficient without either checking FPMath or afn. I think this needs to be something more like
UnsafeFP || isFast || (allowReciprocal && (denormal hasLowAccuracy || approximateFunction))


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:612-614
+  float ULP = FPOp->getFPAccuracy();
+  if (ULP < 2.5f)
+    DoFDivFast = false;
----------------
It would be clearer to do something like 
bool NeedHighAccuracy = !FPMath || FPMath->getFPAccuracy() < 2.5


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:617
+  // Do not translate to rcp intrinsic for f32 when there is denorm support
+  // or fpmath metadat is unavailable.
+  bool NoRCPIntrinsic = !UnsafeDiv &&
----------------
Typo metadat


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:618
+  // or fpmath metadat is unavailable.
+  bool NoRCPIntrinsic = !UnsafeDiv &&
+                        (Ty->isFloatTy() && (HasFP32Denormals || !FPMath));
----------------
It would be clearer to invert this, instead of the logic below relying on the double negative


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:619
+  bool NoRCPIntrinsic = !UnsafeDiv &&
+                        (Ty->isFloatTy() && (HasFP32Denormals || !FPMath));
+
----------------
FPMath should be checked once, and in relation to it's value only. Checking for the lack of metadata here is imprecise


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7540
   const SDNodeFlags Flags = Op->getFlags();
   bool Unsafe = DAG.getTarget().Options.UnsafeFPMath || Flags.hasAllowReciprocal();
 
----------------
Based on the original problem, Flags.hasAllowReciprocal() isn't sufficient here. Without knowledge of !fpmath, this also needs approximate function


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7542
 
-  if (!Unsafe && VT == MVT::f32 && hasFP32Denormals(DAG.getMachineFunction()))
+  // Do rcp optimization only under "Unsafe" math here.
+  // NOTE: We already performed RCP optimization to insert intrinsics in
----------------
Needs comment explaining why


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:8726
 
+  if ((VT == MVT::f32 || VT == MVT::f16) && N0.getOpcode() == ISD::FSQRT)
+    return DCI.DAG.getNode(AMDGPUISD::RSQ, SDLoc(N), VT,
----------------
Braces


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71293/new/

https://reviews.llvm.org/D71293





More information about the llvm-commits mailing list