[llvm] [AMDGPU] Optimize image sample followed by llvm.amdgcn.cvt.pkrtz into d16 variant (PR #145203)

Mon Jun 30 19:57:44 PDT 2025

================
@@ -247,6 +247,42 @@ simplifyAMDGCNImageIntrinsic(const GCNSubtarget *ST,
                                        ArgTys[0] = User->getType();
                                      });
         }
+
+        // Fold image.sample + cvt.pkrtz -> extractelement idx0 into a single
+        // d16 image sample.
----------------
harrisonGPU wrote:

I have verified that merging an image sample followed by `llvm.amdgcn.cvt.pkrtz` into the D16 version makes sense.

This is because Section 9.3.1 “D16 Instruction” of the RDNA 3 Shader Instruction Set Architecture manual states:
> Conversion of float32 to float16 uses truncation; conversion of other input data formats uses round-to-nearest-even.

This means that when converting an image intrinsic to its D16 variant, RTZ (round-toward-zero) is used — the same behavior as `llvm.amdgcn.cvt.pkrtz`. Therefore, they can be safely combined without raising accuracy concerns.

Reference:
https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf

https://github.com/llvm/llvm-project/pull/145203