[PATCH] D153472: AMDGPU: Correctly expand f64 sqrt intrinsic

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 27 02:54:41 PDT 2023


foad added a comment.

> I am tempted to do this in an IR expansion instead. In the IR
> we could take advantage of computeKnownFPClass to avoid
> the 0-or-inf argument check.

Could you use computeKnownFPClass to set ninf and nsz if the argument is known not to be inf/0, and then check those flags in your existing legalization code?



================
Comment at: llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1184
 
+  /// Build and insert a \p Res = G_IS_FPCLASS \p Pred\p Src, \p Mask
+  MachineInstrBuilder buildIsFPClass(const DstOp &Res, const SrcOp &Src,
----------------



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4861
+  //
+  //   sqrt(x) = g3
+
----------------
It's nice that you have a comment showing the algorithm, but it'd be way more useful if the names in the comment matched the names in the code.

Also what do the `=>` mean? It looks like they show alternative expansions, and I guess we implement the versions on the RHS?? I can see that the second one avoids the need to calculate h2, but what is the point of the first one?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4873
+
+  auto ScaleConstant = B.buildFConstant(F64, 0x1.0p-767);
+
----------------
What is this magic number and why is scaling required? If it is to avoid denormal inputs to amdgcn_rsq then please say that in a comment.


================
Comment at: llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll:44
-; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_sqrt_f64_e64 v[0:1], -|v[0:1]|
-; GCN-NEXT:    s_setpc_b64 s[30:31]
----------------
arsenm wrote:
> Pierre-vh wrote:
> > I think I need some context, why is `v_sqrt_f64` so bad that this expansion is preferred? Accuracy/semantics?
> It's nowhere near accurate enough, I don't really know what the point of the instruction is. The raw instruction definitely just fails OpenCL conformance
`v_sqrt_f64` is documented as having "(2**29)ULP accuracy"(!). My guess is that it just does the f64/f32 format conversions for you, and uses the same underlying sqrt expansion as `v_sqrt_f32`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153472/new/

https://reviews.llvm.org/D153472



More information about the llvm-commits mailing list