[llvm] [AMDGPU][True16][CodeGen] legalize 16bit and 32bit use-def chain for moveToVALU in si-fix-sgpr-lowering (PR #138734)
Brox Chen via llvm-commits
llvm-commits at lists.llvm.org
Mon May 12 12:06:48 PDT 2025
================
@@ -7787,8 +7807,19 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
.addReg(Undef)
.addImm(AMDGPU::hi16);
Inst.eraseFromParent();
-
MRI.replaceRegWith(DstReg, NewDstReg);
+ // legalize useMI with mismatched size
+ for (MachineRegisterInfo::use_iterator I = MRI.use_begin(NewDstReg),
+ E = MRI.use_end();
+ I != E; ++I) {
+ MachineInstr &UseMI = *I->getParent();
+ unsigned UseMIOpcode = UseMI.getOpcode();
+ if (AMDGPU::isTrue16Inst(UseMIOpcode) &&
+ (16 ==
+ RI.getRegSizeInBits(*getOpRegClass(UseMI, I.getOperandNo())))) {
+ I->setSubReg(AMDGPU::lo16);
+ }
+ }
----------------
broxigarchen wrote:
Two things:
1. We mainly have problem with the `replaceRegWith()` call. If we don't have size mismatch issue, we can simply replace the reg to Equivalent VGPR class. But when we have size mismatch issue, this is a problem because we might replace a 16bit reg into a 32bit reg and vice versa. and thus we need to transverse user list.
2. We replace `COPY` like inst first and then process useMI in sequence. However for multiple operand t16 inst, we might have useMI being lowered first before all COPY operands being processed. i.e.
```
(1) %0:vgpr_16 = IMPLICIT_DEF
(2) %1:sgpr_lo16 = COPY %0:vgpr_16
(3) %2:sreg_32 = COPY %0:vgpr_16
(4) %3:sreg_32 = COPY %1:sgpr_lo16
(5) %4:sreg_32 = S_FMAC_F16 %3:sreg_32, %3:sreg_32, %2:sreg_32, implicit $mode
```
The order of lowering goes from (3)->(2)->(5)->(4) ((4) becomes a v2s copy after (2) is lowered). And thus, this hit a problem and we need to check useMI again when we lower (4).
There are multiple ways to fix this. What I am currently doing is to add use list check when we process mismatch size inst.
https://github.com/llvm/llvm-project/pull/138734
More information about the llvm-commits
mailing list