[llvm] [AMDGPU][True16][GlobalISel] Fix v2*16 build_vector patterns (PR #151496)
Brox Chen via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 31 08:40:01 PDT 2025
================
@@ -3543,15 +3543,29 @@ def : GCNPat <
(vecTy (UniformBinFrag<build_vector> (Ty undef), (Ty SReg_32:$src1))),
(S_LSHL_B32 SReg_32:$src1, (i32 16))
>;
-}
def : GCNPat <
(vecTy (DivergentBinFrag<build_vector> (Ty undef), (Ty VGPR_32:$src1))),
(vecTy (V_LSHLREV_B32_e64 (i32 16), VGPR_32:$src1))
>;
-} // End foreach Ty = ...
}
+let True16Predicate = UseRealTrue16Insts in
+def : GCNPat <
+ (vecTy (DivergentBinFrag<build_vector> (Ty undef), (Ty VGPR_32:$src1))),
+ (REG_SEQUENCE VGPR_32, (Ty (IMPLICIT_DEF)), lo16, (Ty VGPR_32:$src1), hi16)
----------------
broxigarchen wrote:
oh I just saw that Ty is in loop i16/f16/bf16. In that case I think this V_LSHLREV_B32_e64 pattern maybe just needed for non-true16 mode.
Can you try removing this and see if ISel can get it right? Otherwise maybe try replacing with this
(REG_SEQUENCE VGPR_32, (Ty (IMPLICIT_DEF)), lo16, (Ty VGPR_16:$src1), hi16)
>Do you have an example of this?
We should not take16bit input into a vgpr32 in true16 mode. The 16bit value could be the output from a true16 instruction and thus isel will insert a "vgpr32 = COPY vgpr16" to match the register class which is illegal.
https://github.com/llvm/llvm-project/pull/151496
More information about the llvm-commits
mailing list