[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)

Tue Jun 10 10:47:44 PDT 2025

================
@@ -3595,6 +3635,9 @@ NVPTXTargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
   const auto VectorInfo = VectorizePTXValueVTs(VTs, Offsets, RetAlign);
   unsigned I = 0;
   for (const unsigned NumElts : VectorInfo) {
+    // amount to subdivide. If not v2f32, we don't consider packing
+    const unsigned PackingAmt = VTs[I] == MVT::v2f32 ? 2 : 1;
----------------
Prince781 wrote:

This change allows us to keep generating `st.param.v2.b32` for `ret <2 x float>` instead of `st.param.b64`, because I want to keep pre-SM100 code intact. It may make sense to do this for `v2f16` / `v2bf16` as well, but I'd rather explore that in another PR.

https://github.com/llvm/llvm-project/pull/126337