[PATCH] D153472: AMDGPU: Correctly expand f64 sqrt intrinsic

Tue Jul 25 04:37:06 PDT 2023

Pierre-vh accepted this revision as: Pierre-vh.
Pierre-vh added a comment.
This revision is now accepted and ready to land.

LGTM with some nits

================
Comment at: llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1183-1184
                                 std::optional<unsigned> Flags = std::nullopt);

+  /// Build and insert a \p Res = G_IS_FPCLASS \p Pred\p Src, \p Mask
+  MachineInstrBuilder buildIsFPClass(const DstOp &Res, const SrcOp &Src,
----------------

================
Comment at: llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1186
+  MachineInstrBuilder buildIsFPClass(const DstOp &Res, const SrcOp &Src,
+                                     unsigned Mask) {
+    return buildInstr(TargetOpcode::G_IS_FPCLASS, {Res},
----------------
Nit: can't mask just be `int64_t` directly?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4844
+  // For double type, the SQRT and RSQ instructions don't have required
+  // precision, we apply Goldschmidt's algorithm to improve the result:
+  //
----------------
Add that they fail OCL conformance + many users avoid using it because of that, just for some context.

================
Comment at: llvm/test/Analysis/CostModel/AMDGPU/arith-fp.ll:55
 ; ALL-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16F32 = call <16 x float> @llvm.sqrt.v16f32(<16 x float> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %F64 = call double @llvm.sqrt.f64(double undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = call <2 x double> @llvm.sqrt.v2f64(<2 x double> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = call <4 x double> @llvm.sqrt.v4f64(<4 x double> undef)
-; ALL-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8F64 = call <8 x double> @llvm.sqrt.v8f64(<8 x double> undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F64 = call double @llvm.sqrt.f64(double undef)
+; ALL-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2F64 = call <2 x double> @llvm.sqrt.v2f64(<2 x double> undef)
----------------
Why did this change?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153472/new/

https://reviews.llvm.org/D153472