[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set

Wed Jan 22 13:00:45 PST 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:616
+  // To control whether to lower to fdiv.fast.
+  bool UseFDivFast = true;
+  Type *Ty = FDiv.getType()->getScalarType();
----------------
You can just initialize this below with the logical value instead of setting the value conditionally

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:640
+         (FMF.allowReciprocal() &&
+          (!HasFP32Denormals || !NeedHighAccuracy || FMF.approxFunc()));

----------------
I think this still isn't quite right.

I think this should be (FMF.allowReciprocal() && ((!HasFP32Denormals && !NeedHighAccuracy) ||  FMF.approxFunc())).

As is, this will allow reciprocal when denormals are flushed, but the higher fdiv precision is required, which was the case you were trying to fix in the first place

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7542
+                (Flags.hasAllowReciprocal() && Flags.hasApproximateFuncs());
+
+  //   Do rcp optimization only when fast unsafe rcp is legal here.
----------------
This still needs the denormal and type checks

================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll:88
+
+  %arcp.low.accuracy = fdiv arcp float 1.0, %x, !fpmath !0
+  store volatile float %arcp.low.accuracy, float addrspace(1)* %out
----------------
This should not have produced rcp since denormals are enabled and it doesn't have afn.

================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll:91-92
+
+  %arcp.high.accuracy = fdiv arcp float 1.0, %x, !fpmath !1
+  store volatile float %arcp.high.accuracy, float addrspace(1)* %out
+
----------------
The name says high accuracy, but 5 ulp is lower accuracy. This didn't form rcp, but I think for the wrong reason

================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll:94-98
+  %arcp.low.afn = fdiv arcp afn float 1.0, %x, !fpmath !0
+  store volatile float %arcp.low.afn, float addrspace(1)* %out
+
+  %arcp.high.afn = fdiv arcp afn float 1.0, %x, !fpmath !1
+  store volatile float %arcp.high.afn, float addrspace(1)* %out
----------------
These two I think are OK because of afn

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71293/new/

https://reviews.llvm.org/D71293