[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set

Mon Jan 20 19:53:43 PST 2020

arsenm added a comment.

The GlobalISel path should also be fixed, but that can be a follow up patch

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:585
+// reduce precision. Perform RCP optimization to insert intrinsic whenever
+// it is suitable to do so.
 bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
----------------
Needs comment explaining the interaction between !fpmath requirements and denormals. This could use a chart of different fast math options, FP math and denormal handling and the expected lowering

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:603
   bool UnsafeDiv = HasUnsafeFPMath || FMF.isFast() ||
                                       FMF.allowReciprocal();
   // With UnsafeDiv node will be optimized to just rcp and mul.
----------------
I don't think just allow reciprocal is sufficient without either checking FPMath or afn. I think this needs to be something more like
UnsafeFP || isFast || (allowReciprocal && (denormal hasLowAccuracy || approximateFunction))

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:612-614
+  float ULP = FPOp->getFPAccuracy();
+  if (ULP < 2.5f)
+    DoFDivFast = false;
----------------
It would be clearer to do something like 
bool NeedHighAccuracy = !FPMath || FPMath->getFPAccuracy() < 2.5

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:617
+  // Do not translate to rcp intrinsic for f32 when there is denorm support
+  // or fpmath metadat is unavailable.
+  bool NoRCPIntrinsic = !UnsafeDiv &&
----------------
Typo metadat

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:618
+  // or fpmath metadat is unavailable.
+  bool NoRCPIntrinsic = !UnsafeDiv &&
+                        (Ty->isFloatTy() && (HasFP32Denormals || !FPMath));
----------------
It would be clearer to invert this, instead of the logic below relying on the double negative

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:619
+  bool NoRCPIntrinsic = !UnsafeDiv &&
+                        (Ty->isFloatTy() && (HasFP32Denormals || !FPMath));
+
----------------
FPMath should be checked once, and in relation to it's value only. Checking for the lack of metadata here is imprecise

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7540
   const SDNodeFlags Flags = Op->getFlags();
   bool Unsafe = DAG.getTarget().Options.UnsafeFPMath || Flags.hasAllowReciprocal();

----------------
Based on the original problem, Flags.hasAllowReciprocal() isn't sufficient here. Without knowledge of !fpmath, this also needs approximate function

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7542

-  if (!Unsafe && VT == MVT::f32 && hasFP32Denormals(DAG.getMachineFunction()))
+  // Do rcp optimization only under "Unsafe" math here.
+  // NOTE: We already performed RCP optimization to insert intrinsics in
----------------
Needs comment explaining why

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:8726

+  if ((VT == MVT::f32 || VT == MVT::f16) && N0.getOpcode() == ISD::FSQRT)
+    return DCI.DAG.getNode(AMDGPUISD::RSQ, SDLoc(N), VT,
----------------
Braces

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71293/new/

https://reviews.llvm.org/D71293