[PATCH] D153472: AMDGPU: Correctly expand f64 sqrt intrinsic
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 27 02:54:41 PDT 2023
foad added a comment.
> I am tempted to do this in an IR expansion instead. In the IR
> we could take advantage of computeKnownFPClass to avoid
> the 0-or-inf argument check.
Could you use computeKnownFPClass to set ninf and nsz if the argument is known not to be inf/0, and then check those flags in your existing legalization code?
================
Comment at: llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1184
+ /// Build and insert a \p Res = G_IS_FPCLASS \p Pred\p Src, \p Mask
+ MachineInstrBuilder buildIsFPClass(const DstOp &Res, const SrcOp &Src,
----------------
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4861
+ //
+ // sqrt(x) = g3
+
----------------
It's nice that you have a comment showing the algorithm, but it'd be way more useful if the names in the comment matched the names in the code.
Also what do the `=>` mean? It looks like they show alternative expansions, and I guess we implement the versions on the RHS?? I can see that the second one avoids the need to calculate h2, but what is the point of the first one?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4873
+
+ auto ScaleConstant = B.buildFConstant(F64, 0x1.0p-767);
+
----------------
What is this magic number and why is scaling required? If it is to avoid denormal inputs to amdgcn_rsq then please say that in a comment.
================
Comment at: llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll:44
-; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT: v_sqrt_f64_e64 v[0:1], -|v[0:1]|
-; GCN-NEXT: s_setpc_b64 s[30:31]
----------------
arsenm wrote:
> Pierre-vh wrote:
> > I think I need some context, why is `v_sqrt_f64` so bad that this expansion is preferred? Accuracy/semantics?
> It's nowhere near accurate enough, I don't really know what the point of the instruction is. The raw instruction definitely just fails OpenCL conformance
`v_sqrt_f64` is documented as having "(2**29)ULP accuracy"(!). My guess is that it just does the f64/f32 format conversions for you, and uses the same underlying sqrt expansion as `v_sqrt_f32`.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D153472/new/
https://reviews.llvm.org/D153472
More information about the llvm-commits
mailing list