[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)
Krzysztof Drewniak via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 10 10:49:32 PST 2025
================
@@ -73,6 +73,15 @@ class VOP_Pseudo <string opName, string suffix, VOPProfile P, dag outs, dag ins,
bit IsTrue16 = P.IsTrue16;
VOPProfile Pfl = P;
+ // True if destination is FP16 and all sources are 16-bit (FP16, BF16, or INT16).
+ // Used for V_PACK_B32_F16 optimization in SIPeepholeSDWA Pass.
+ bit IsSrcDestFP16 = !and(
----------------
krzysz00 wrote:
So, one general observation - I know the instruction is named v_pack_b32_f16 ... but do we know if it has any `f16`-specific semantics? That is, could it have been named `v_pack_b32_b16`? If so, we don't need the "all sources are fp16" constraint?
Second, as something to think about ... can this OR be done by querying existing properties in C++? (But also I think a table's fine)
https://github.com/llvm/llvm-project/pull/137137
More information about the llvm-commits
mailing list