[llvm] [AMDGPU] Optimize image sample followed by llvm.amdgcn.cvt.pkrtz into d16 variant (PR #145203)
Harrison Hao via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 2 02:32:31 PDT 2025
================
@@ -247,6 +247,42 @@ simplifyAMDGCNImageIntrinsic(const GCNSubtarget *ST,
ArgTys[0] = User->getType();
});
}
+
+ // Fold image.sample + cvt.pkrtz -> extractelement idx0 into a single
+ // d16 image sample.
----------------
harrisonGPU wrote:
So do you mean that if the input data is in fixed-point format, it gets converted to fp32 using RTE, and then if we use pkrtz, it applies RTZ?
If we fold this into a D16 image sample, it will only apply RTZ, which could cause accuracy issues. Is that what you're saying?
But in the IR I can't see any data format info in the image instruction, and I haven’t found a case that clearly uses a fixed-point format.
Do you have an example? I'd like to verify it. By fixed-point formats, you mean things like UNORM, right?
https://github.com/llvm/llvm-project/pull/145203
More information about the llvm-commits
mailing list