[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Wed Dec 10 23:50:55 PST 2025

================
@@ -73,6 +73,15 @@ class VOP_Pseudo <string opName, string suffix, VOPProfile P, dag outs, dag ins,
   bit IsTrue16 = P.IsTrue16;
   VOPProfile Pfl = P;
 
+  // True if destination is FP16 and all sources are 16-bit (FP16, BF16, or INT16).
+  // Used for V_PACK_B32_F16 optimization in SIPeepholeSDWA Pass.
+  bit IsSrcDestFP16 = !and(
----------------
vg0204 wrote:

> So, one general observation - I know the instruction is named v_pack_b32_f16 ... but do we know if it has any f16-specific semantics? That is, could it have been named v_pack_b32_b16? If so, we don't need the "all sources are fp16" constraint?

Its VOPProfile from its definition confirms the semantics strictly : `defm V_PACK_B32_F16 : VOP3Inst_t16 <"v_pack_b32_f16", VOP_B32_F16_F16>;`  can be found in VOP3Instructions.td

https://github.com/llvm/llvm-project/pull/137137