[llvm] [AMDGPU] Generate COPY for each use-constraint instead of constraining the register class (PR #182104)

Chinmay Deshpande via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 18 11:04:10 PST 2026


================
@@ -8354,10 +8334,33 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
           llvm_unreachable("failed to constrain register");
 
         Inst.eraseFromParent();
-        // Legalize t16 operand since replaceReg is called after addUsersToVALU
-        for (MachineOperand &MO :
+
+        const TargetRegisterClass *NewDstRegRC = MRI.getRegClass(NewDstReg);
+        for (MachineOperand &UseMO :
              make_early_inc_range(MRI.use_operands(NewDstReg))) {
-          legalizeOperandsVALUt16(*MO.getParent(), MRI);
+          MachineInstr &UseMI = *UseMO.getParent();
+
+          // Legalize t16 operands since replaceReg is called after
+          // addUsersToVALU.
+          legalizeOperandsVALUt16(UseMI, MRI);
+
+          // If a user operand requires a narrower register class than
+          // NewDstReg (e.g., VGPR_32_Lo256 for WMMA scale operands), emit
+          // a COPY to a new register with the correct class.
+          unsigned OpIdx = UseMI.getOperandNo(&UseMO);
+          const TargetRegisterClass *OpRC =
+              getRegClass(UseMI.getDesc(), OpIdx);
----------------
chinmaydd wrote:

```suggestion
          const TargetRegisterClass *OpRC = getRegClass(UseMI.getDesc(), OpIdx);
```

https://github.com/llvm/llvm-project/pull/182104


More information about the llvm-commits mailing list