[PATCH] D71293: AMDGPU: Generate the correct sequence of code for FDIV32 when correctly-rounded-divide-sqrt is set
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 22 14:40:45 PST 2020
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:640
+ (FMF.allowReciprocal() &&
+ (!HasFP32Denormals || !NeedHighAccuracy || FMF.approxFunc()));
----------------
cfang wrote:
> arsenm wrote:
> > I think this still isn't quite right.
> >
> > I think this should be (FMF.allowReciprocal() && ((!HasFP32Denormals && !NeedHighAccuracy) || FMF.approxFunc())).
> >
> > As is, this will allow reciprocal when denormals are flushed, but the higher fdiv precision is required, which was the case you were trying to fix in the first place
> How could we handle fp16 and fp64? I think HasFP32Denormals only matter for fp32.
>
> Also, the issue I am working on seems not related to FMF.allowReciprocal() at all unless arcp is default.
Yes, this also needs to account for FP32denormals. RCP for f16 doesn't' care about the fp16 denormal mode
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71293/new/
https://reviews.llvm.org/D71293
More information about the llvm-commits
mailing list