[llvm] [AMDGPU] Optimize image sample followed by llvm.amdgcn.cvt.pkrtz into d16 variant (PR #145203)

Sun Jun 22 00:39:00 PDT 2025

================
@@ -247,6 +247,42 @@ simplifyAMDGCNImageIntrinsic(const GCNSubtarget *ST,
                                        ArgTys[0] = User->getType();
                                      });
         }
+
+        // Fold image.sample + cvt.pkrtz -> extractelement idx0 into a single
+        // d16 image sample.
----------------
DadSchoorse wrote:

We had a similar optimization in mesa and it broke tests because D16 does not round the same as a 32bit load followed by v_cvt_pk_f16_f32 for fixed point formats. D16 directly rounds towards nearest even in fp16, while the unoptimized pattern with first round towards nearest even in fp32, and then rounds towards zero in fp16.

https://github.com/llvm/llvm-project/pull/145203