[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)
Princeton Ferro via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 10 10:47:44 PDT 2025
================
@@ -3595,6 +3635,9 @@ NVPTXTargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
const auto VectorInfo = VectorizePTXValueVTs(VTs, Offsets, RetAlign);
unsigned I = 0;
for (const unsigned NumElts : VectorInfo) {
+ // amount to subdivide. If not v2f32, we don't consider packing
+ const unsigned PackingAmt = VTs[I] == MVT::v2f32 ? 2 : 1;
----------------
Prince781 wrote:
This change allows us to keep generating `st.param.v2.b32` for `ret <2 x float>` instead of `st.param.b64`, because I want to keep pre-SM100 code intact. It may make sense to do this for `v2f16` / `v2bf16` as well, but I'd rather explore that in another PR.
https://github.com/llvm/llvm-project/pull/126337
More information about the llvm-commits
mailing list