[llvm] [AMDGPU] Optimize image sample followed by llvm.amdgcn.cvt.pkrtz into d16 variant (PR #145203)
Harrison Hao via llvm-commits
llvm-commits at lists.llvm.org
Sun Jun 22 00:51:56 PDT 2025
================
@@ -247,6 +247,42 @@ simplifyAMDGCNImageIntrinsic(const GCNSubtarget *ST,
ArgTys[0] = User->getType();
});
}
+
+ // Fold image.sample + cvt.pkrtz -> extractelement idx0 into a single
+ // d16 image sample.
----------------
harrisonGPU wrote:
Thanks! Could you share some test cases for me to verify this issue? So far, I haven’t encountered the problem you mentioned. Since image sampling with D16 only uses the lower 16 bits, and I ensure that the first argument to cvt.pkrtz is the result of the image sample, I believe this folding is valid. I’ve tested a few cases and haven’t observed any issues so far, but I’ll continue testing more to be safe.
Additionally, we've used this optimization in other compilers (not LLVM) for years without encountering any problems.
https://github.com/llvm/llvm-project/pull/145203
More information about the llvm-commits
mailing list