[llvm] [AMDGPU][True16][CodeGen] legalize 16bit and 32bit use-def chain for moveToVALU in si-fix-sgpr-lowering (PR #138734)

Tue May 27 14:56:44 PDT 2025

================
@@ -7787,8 +7807,19 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
             .addReg(Undef)
             .addImm(AMDGPU::hi16);
         Inst.eraseFromParent();
-
         MRI.replaceRegWith(DstReg, NewDstReg);
+        // legalize useMI with mismatched size
+        for (MachineRegisterInfo::use_iterator I = MRI.use_begin(NewDstReg),
+                                               E = MRI.use_end();
+             I != E; ++I) {
+          MachineInstr &UseMI = *I->getParent();
+          unsigned UseMIOpcode = UseMI.getOpcode();
+          if (AMDGPU::isTrue16Inst(UseMIOpcode) &&
+              (16 ==
+               RI.getRegSizeInBits(*getOpRegClass(UseMI, I.getOperandNo())))) {
+            I->setSubReg(AMDGPU::lo16);
+          }
+        }
----------------
Sisyph wrote:

Is there a way to fix this with iteration order? If (5) is indeed processed by moveToVALUImpl before all of it's operands, we need to do the use list scan when we replace any register that could be used in a true16 instruction. But if we could guarantee the arguments would be processed before the useMI, then I think the call to legalizeOperandsVALUt16 would fix up these cases.

https://github.com/llvm/llvm-project/pull/138734