[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 22 13:00:45 PST 2020
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:616
+ // To control whether to lower to fdiv.fast.
+ bool UseFDivFast = true;
+ Type *Ty = FDiv.getType()->getScalarType();
----------------
You can just initialize this below with the logical value instead of setting the value conditionally
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:640
+ (FMF.allowReciprocal() &&
+ (!HasFP32Denormals || !NeedHighAccuracy || FMF.approxFunc()));
----------------
I think this still isn't quite right.
I think this should be (FMF.allowReciprocal() && ((!HasFP32Denormals && !NeedHighAccuracy) || FMF.approxFunc())).
As is, this will allow reciprocal when denormals are flushed, but the higher fdiv precision is required, which was the case you were trying to fix in the first place
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7542
+ (Flags.hasAllowReciprocal() && Flags.hasApproximateFuncs());
+
+ // Do rcp optimization only when fast unsafe rcp is legal here.
----------------
This still needs the denormal and type checks
================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll:88
+
+ %arcp.low.accuracy = fdiv arcp float 1.0, %x, !fpmath !0
+ store volatile float %arcp.low.accuracy, float addrspace(1)* %out
----------------
This should not have produced rcp since denormals are enabled and it doesn't have afn.
================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll:91-92
+
+ %arcp.high.accuracy = fdiv arcp float 1.0, %x, !fpmath !1
+ store volatile float %arcp.high.accuracy, float addrspace(1)* %out
+
----------------
The name says high accuracy, but 5 ulp is lower accuracy. This didn't form rcp, but I think for the wrong reason
================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll:94-98
+ %arcp.low.afn = fdiv arcp afn float 1.0, %x, !fpmath !0
+ store volatile float %arcp.low.afn, float addrspace(1)* %out
+
+ %arcp.high.afn = fdiv arcp afn float 1.0, %x, !fpmath !1
+ store volatile float %arcp.high.afn, float addrspace(1)* %out
----------------
These two I think are OK because of afn
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71293/new/
https://reviews.llvm.org/D71293
More information about the llvm-commits
mailing list