[llvm] [AMDGPU] Optimize image sample followed by llvm.amdgcn.cvt.pkrtz into d16 variant (PR #145203)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 1 02:43:33 PDT 2025
================
@@ -247,6 +247,42 @@ simplifyAMDGCNImageIntrinsic(const GCNSubtarget *ST,
ArgTys[0] = User->getType();
});
}
+
+ // Fold image.sample + cvt.pkrtz -> extractelement idx0 into a single
+ // d16 image sample.
+ // Pattern to match:
+ // %sample = call float @llvm.amdgcn.image.sample...
+ // %pack = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %sample,
+ // float %any)
+ // %low = extractelement <2 x half> %pack, i64 0
+ // Replacement:
+ // call half @llvm.amdgcn.image.sample
----------------
jayfoad wrote:
This seems OK, but do we need to handle more cases? E.g.
- What if both inputs of cvt.pkrtz come from image.sample instruction?
- What if image.sample returns `<2 x float>` or `<4 x float>` and all values are converted to f16?
Incidentally it would easier to implement the pattern matching if we provided a scalar `half @llvm.amdgcn.cvt.rtz(float)` intrinsic, instead of one intrinsic that does both the conversion and the packing.
https://github.com/llvm/llvm-project/pull/145203
More information about the llvm-commits
mailing list