[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 22 14:40:45 PST 2020


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:640
+         (FMF.allowReciprocal() &&
+          (!HasFP32Denormals || !NeedHighAccuracy || FMF.approxFunc()));
 
----------------
cfang wrote:
> arsenm wrote:
> > I think this still isn't quite right.
> > 
> > I think this should be (FMF.allowReciprocal() && ((!HasFP32Denormals && !NeedHighAccuracy) ||  FMF.approxFunc())).
> > 
> > As is, this will allow reciprocal when denormals are flushed, but the higher fdiv precision is required, which was the case you were trying to fix in the first place
> How could  we handle fp16 and fp64?  I think  HasFP32Denormals only matter for fp32.
> 
> Also, the issue I am working on seems not related to FMF.allowReciprocal() at all unless arcp is default.
Yes, this also needs to account for FP32denormals. RCP for f16 doesn't' care about the fp16 denormal mode


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71293/new/

https://reviews.llvm.org/D71293





More information about the llvm-commits mailing list