[llvm] [AMDGPU][True16][CodeGen] legalize 16bit and 32bit use-def chain for moveToVALU in si-fix-sgpr-lowering (PR #138734)

Mon May 12 12:06:48 PDT 2025

================
@@ -7787,8 +7807,19 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
             .addReg(Undef)
             .addImm(AMDGPU::hi16);
         Inst.eraseFromParent();
-
         MRI.replaceRegWith(DstReg, NewDstReg);
+        // legalize useMI with mismatched size
+        for (MachineRegisterInfo::use_iterator I = MRI.use_begin(NewDstReg),
+                                               E = MRI.use_end();
+             I != E; ++I) {
+          MachineInstr &UseMI = *I->getParent();
+          unsigned UseMIOpcode = UseMI.getOpcode();
+          if (AMDGPU::isTrue16Inst(UseMIOpcode) &&
+              (16 ==
+               RI.getRegSizeInBits(*getOpRegClass(UseMI, I.getOperandNo())))) {
+            I->setSubReg(AMDGPU::lo16);
+          }
+        }
----------------
broxigarchen wrote:

Two things:

1. We mainly have problem with the `replaceRegWith()` call. If we don't have size mismatch issue, we can simply replace the reg to Equivalent VGPR class. But when we have size mismatch issue, this is a problem because we might replace a 16bit reg into a 32bit reg and vice versa. and thus we need to transverse user list.
2. We replace `COPY` like inst first and then process useMI in sequence. However for multiple operand t16 inst, we might have useMI being lowered first before all COPY operands being processed. i.e.
```
(1) %0:vgpr_16 = IMPLICIT_DEF
(2) %1:sgpr_lo16 = COPY %0:vgpr_16
(3) %2:sreg_32 = COPY %0:vgpr_16
(4) %3:sreg_32 = COPY %1:sgpr_lo16
(5) %4:sreg_32 = S_FMAC_F16 %3:sreg_32, %3:sreg_32, %2:sreg_32, implicit $mode
```
The order of lowering goes from (3)->(2)->(5)->(4) ((4) becomes a v2s copy after (2) is lowered). And thus, this hit a problem and we need to check useMI again when we lower (4).

There are multiple ways to fix this. What I am currently doing is to add use list check when we process mismatch size inst.


https://github.com/llvm/llvm-project/pull/138734