[llvm] [AMDGPU][True16][CodeGen] S_PACK_XX_B32_B16 lowering for true16 mode (PR #162389)
Brox Chen via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 14 14:04:05 PDT 2025
================
@@ -9084,6 +9115,63 @@ void SIInstrInfo::movePackToVALU(SIInstrWorklist &Worklist,
MachineOperand &Src1 = Inst.getOperand(2);
const DebugLoc &DL = Inst.getDebugLoc();
+ if (ST.useRealTrue16Insts()) {
+ Register SrcReg0 = Src0.getReg();
+ Register SrcReg1 = Src1.getReg();
+
+ if (!RI.isVGPR(MRI, SrcReg0)) {
+ SrcReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+ BuildMI(*MBB, Inst, DL, get(AMDGPU::V_MOV_B32_e32), SrcReg0).add(Src0);
----------------
broxigarchen wrote:
Thanks for pointing this out. For non-register case, I created a new condition here, and also a new test.
For reg case, replacing to COPY causes a probem since this pack instruction creates a `vgpr32 = copy sreg32` followed by a `vgpr_hi16 = COPY hi16:vgpr32`. The compiler will try to fold and create a `vgpr_hi16 = COPY sreg_hi16` in the machine copy propagation.
I can certainly lower this COPY in PostRAExpand, but since we don't want to generate spgr_16 in the pipeline, this might cause unexpect issue in other pass with a larger test case.
I think it's better to merge this patch as it to unblock downstream branch, and create another patch to address this issue in a proper way
https://github.com/llvm/llvm-project/pull/162389
More information about the llvm-commits
mailing list