[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set

Changpeng Fang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 22 14:31:38 PST 2020


cfang marked 3 inline comments as done.
cfang added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:616
+  // To control whether to lower to fdiv.fast.
+  bool UseFDivFast = true;
+  Type *Ty = FDiv.getType()->getScalarType();
----------------
arsenm wrote:
> You can just initialize this below with the logical value instead of setting the value conditionally
Thanks, Will do like that.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:640
+         (FMF.allowReciprocal() &&
+          (!HasFP32Denormals || !NeedHighAccuracy || FMF.approxFunc()));
 
----------------
arsenm wrote:
> I think this still isn't quite right.
> 
> I think this should be (FMF.allowReciprocal() && ((!HasFP32Denormals && !NeedHighAccuracy) ||  FMF.approxFunc())).
> 
> As is, this will allow reciprocal when denormals are flushed, but the higher fdiv precision is required, which was the case you were trying to fix in the first place
How could  we handle fp16 and fp64?  I think  HasFP32Denormals only matter for fp32.

Also, the issue I am working on seems not related to FMF.allowReciprocal() at all unless arcp is default.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71293/new/

https://reviews.llvm.org/D71293





More information about the llvm-commits mailing list