[llvm] d21182d - AMDGPU/GlobalISel: Only map VOP operands to VGPRs

Matt Arsenault via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 30 05:32:44 PST 2020


Author: Matt Arsenault
Date: 2020-01-30T08:32:35-05:00
New Revision: d21182d692e8109fdc3af1eb52dd293fbb3e876f

URL: https://github.com/llvm/llvm-project/commit/d21182d692e8109fdc3af1eb52dd293fbb3e876f
DIFF: https://github.com/llvm/llvm-project/commit/d21182d692e8109fdc3af1eb52dd293fbb3e876f.diff

LOG: AMDGPU/GlobalISel: Only map VOP operands to VGPRs

This trivially avoids violating the constant bus restriction.

Previously this was allowing one SGPR in the first source
operand, which technically also avoided violating this for most
operations (but not for special cases reading vcc).

We do need to write some new, smarter operand folds to pick the
optimal SGPR to use in some kind of post-isel fold, but that's purely
an optimization.

I was originally thinking we would pick which operands should be SGPRs
in RegBankSelect, but I think this isn't really manageable. There
would be additional complexity to handle every G_* instruction, and
then any nontrivial instruction patterns would need to know when to
avoid violating it, which is likely to be very error prone.

I think having all inputs being canonically copies to VGPRs will
simplify the operand folding logic. The current folding we do is
backwards, and only considers one operand at a time, relative to
operands it already has. It therefore poorly handles the case where
there is already a constant bus operand user. If all operands are
copies, it's somewhat simpler to consider all input operands at once
to choose the optimal constant bus user.

Since the failure mode for constant bus violations is now a verifier
error and not an selection failure, this moves towards a place where
we can turn on the fallback mode. The SGPR copy folding optimizations
can be left for later.

Added: 
    llvm/test/CodeGen/AMDGPU/GlobalISel/constant-bus-restriction.ll

Modified: 
    llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
    llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.dec.ll
    llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.inc.ll
    llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll
    llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-add.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.class.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.cvt.pkrtz.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.fmas.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.scale.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fcmp.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fmul.legacy.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.icmp.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgpu-ffbh-u32.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-and.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ashr.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fadd.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fcanonicalize.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fceil.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fexp2.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-flog2.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fma.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fmul.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fpext.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptosi.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptoui.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptrunc.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-frint.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsqrt.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsub.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-intrinsic-trunc.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-lshr.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mul.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-or.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-shl.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sitofp.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smax.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smin.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smulh.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sub.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-uitofp.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umax.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umin.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umulh.mir
    llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-xor.mir

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 1f56dd4a1e4c..67ef4e4b7513 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -533,24 +533,6 @@ AMDGPURegisterBankInfo::getInstrAlternativeMappings(
          AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
       3); // Num Operands
     AltMappings.push_back(&VVMapping);
-
-    const InstructionMapping &SVMapping = getInstructionMapping(
-      3, 3, getOperandsMapping(
-        {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
-         AMDGPU::getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size),
-         AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
-      3); // Num Operands
-    AltMappings.push_back(&SVMapping);
-
-    // SGPR in LHS is slightly preferrable, so make it VS more expensive than
-    // SV.
-    const InstructionMapping &VSMapping = getInstructionMapping(
-      3, 4, getOperandsMapping(
-        {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
-         AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
-         AMDGPU::getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size)}),
-      3); // Num Operands
-    AltMappings.push_back(&VSMapping);
     break;
   }
   case TargetOpcode::G_LOAD:
@@ -600,22 +582,6 @@ AMDGPURegisterBankInfo::getInstrAlternativeMappings(
       4); // Num Operands
     AltMappings.push_back(&SSMapping);
 
-    const InstructionMapping &SVMapping = getInstructionMapping(2, 1,
-      getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
-                          nullptr, // Predicate operand.
-                          AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
-                          AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size)}),
-      4); // Num Operands
-    AltMappings.push_back(&SVMapping);
-
-    const InstructionMapping &VSMapping = getInstructionMapping(3, 1,
-      getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
-                          nullptr, // Predicate operand.
-                          AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
-                          AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
-      4); // Num Operands
-    AltMappings.push_back(&VSMapping);
-
     const InstructionMapping &VVMapping = getInstructionMapping(4, 1,
       getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
                           nullptr, // Predicate operand.
@@ -650,10 +616,8 @@ AMDGPURegisterBankInfo::getInstrAlternativeMappings(
   case TargetOpcode::G_SMAX:
   case TargetOpcode::G_UMIN:
   case TargetOpcode::G_UMAX: {
-    static const OpRegBankEntry<3> Table[4] = {
+    static const OpRegBankEntry<3> Table[2] = {
       { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
-      { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
-      { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
 
       // Scalar requires cmp+select, and extends if 16-bit.
       // FIXME: Should there be separate costs for 32 and 16-bit
@@ -2440,31 +2404,19 @@ AMDGPURegisterBankInfo::getDefaultMappingVOP(const MachineInstr &MI) const {
   const MachineFunction &MF = *MI.getParent()->getParent();
   const MachineRegisterInfo &MRI = MF.getRegInfo();
   SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
-  unsigned OpdIdx = 0;
-
-  unsigned Size0 = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
-  OpdsMapping[OpdIdx++] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size0);
-
-  if (MI.getOperand(OpdIdx).isIntrinsicID())
-    OpdsMapping[OpdIdx++] = nullptr;
 
-  Register Reg1 = MI.getOperand(OpdIdx).getReg();
-  unsigned Size1 = getSizeInBits(Reg1, MRI, *TRI);
-
-  unsigned DefaultBankID = Size1 == 1 ?
-    AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
-  unsigned Bank1 = getRegBankID(Reg1, MRI, *TRI, DefaultBankID);
-
-  OpdsMapping[OpdIdx++] = AMDGPU::getValueMapping(Bank1, Size1);
-
-  for (unsigned e = MI.getNumOperands(); OpdIdx != e; ++OpdIdx) {
-    const MachineOperand &MO = MI.getOperand(OpdIdx);
-    if (!MO.isReg())
+  // Even though we technically could use SGPRs, this would require knowledge of
+  // the constant bus restriction. Force all sources to VGPR (except for VCC).
+  //
+  // TODO: Unary ops are trivially OK, so accept SGPRs?
+  for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
+    const MachineOperand &Src = MI.getOperand(i);
+    if (!Src.isReg())
       continue;
 
-    unsigned Size = getSizeInBits(MO.getReg(), MRI, *TRI);
+    unsigned Size = getSizeInBits(Src.getReg(), MRI, *TRI);
     unsigned BankID = Size == 1 ? AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
-    OpdsMapping[OpdIdx] = AMDGPU::getValueMapping(BankID, Size);
+    OpdsMapping[i] = AMDGPU::getValueMapping(BankID, Size);
   }
 
   return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
@@ -3298,11 +3250,8 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
       OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Dst1Size);
 
       unsigned SrcSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
-      OpdsMapping[3] = AMDGPU::getValueMapping(
-        getRegBankID(MI.getOperand(3).getReg(), MRI, *TRI), SrcSize);
-      OpdsMapping[4] = AMDGPU::getValueMapping(
-        getRegBankID(MI.getOperand(4).getReg(), MRI, *TRI), SrcSize);
-
+      OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
+      OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
       break;
     }
     case Intrinsic::amdgcn_class: {
@@ -3312,10 +3261,8 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
       unsigned Src1Size = MRI.getType(Src1Reg).getSizeInBits();
       unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
       OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
-      OpdsMapping[2] = AMDGPU::getValueMapping(getRegBankID(Src0Reg, MRI, *TRI),
-                                               Src0Size);
-      OpdsMapping[3] = AMDGPU::getValueMapping(getRegBankID(Src1Reg, MRI, *TRI),
-                                               Src1Size);
+      OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src0Size);
+      OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src1Size);
       break;
     }
     case Intrinsic::amdgcn_icmp:
@@ -3324,10 +3271,8 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
       // This is not VCCRegBank because this is not used in boolean contexts.
       OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
       unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
-      unsigned Op1Bank = getRegBankID(MI.getOperand(2).getReg(), MRI, *TRI);
-      unsigned Op2Bank = getRegBankID(MI.getOperand(3).getReg(), MRI, *TRI);
-      OpdsMapping[2] = AMDGPU::getValueMapping(Op1Bank, OpSize);
-      OpdsMapping[3] = AMDGPU::getValueMapping(Op2Bank, OpSize);
+      OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
+      OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
       break;
     }
     case Intrinsic::amdgcn_readlane: {

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/constant-bus-restriction.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/constant-bus-restriction.ll
new file mode 100644
index 000000000000..a81255f1ab60
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/constant-bus-restriction.ll
@@ -0,0 +1,360 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -stop-after=regbankselect -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefix=GFX9 %s
+; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -stop-after=regbankselect -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefix=GFX10 %s
+
+; Make sure we don't violate the constant bus restriction
+; FIXME: Make this test isa output when div.fmas works.
+
+
+define amdgpu_ps float @fmul_s_s(float inreg %src0, float inreg %src1) {
+  ; GFX9-LABEL: name: fmul_s_s
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY2]], [[COPY3]]
+  ; GFX9:   $vgpr0 = COPY [[FMUL]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: fmul_s_s
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY2]], [[COPY3]]
+  ; GFX10:   $vgpr0 = COPY [[FMUL]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %result = fmul float %src0, %src1
+  ret float %result
+}
+
+define amdgpu_ps float @fmul_ss(float inreg %src) {
+  ; GFX9-LABEL: name: fmul_ss
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY1]], [[COPY2]]
+  ; GFX9:   $vgpr0 = COPY [[FMUL]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: fmul_ss
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY1]], [[COPY2]]
+  ; GFX10:   $vgpr0 = COPY [[FMUL]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %result = fmul float %src, %src
+  ret float %result
+}
+
+; Ternary operation with 3 
diff erent SGPRs
+define amdgpu_ps float @fma_s_s_s(float inreg %src0, float inreg %src1, float inreg %src2) {
+  ; GFX9-LABEL: name: fma_s_s_s
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3, $sgpr4
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[COPY2:%[0-9]+]]:sgpr(s32) = COPY $sgpr4
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
+  ; GFX9:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY3]], [[COPY4]], [[COPY5]]
+  ; GFX9:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: fma_s_s_s
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3, $sgpr4
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[COPY2:%[0-9]+]]:sgpr(s32) = COPY $sgpr4
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
+  ; GFX10:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY3]], [[COPY4]], [[COPY5]]
+  ; GFX10:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %result = call float @llvm.fma.f32(float %src0, float %src1, float %src2)
+  ret float %result
+}
+
+; Ternary operation with 3 identical SGPRs
+define amdgpu_ps float @fma_sss(float inreg %src) {
+  ; GFX9-LABEL: name: fma_sss
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY1]], [[COPY2]], [[COPY3]]
+  ; GFX9:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: fma_sss
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY1]], [[COPY2]], [[COPY3]]
+  ; GFX10:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %result = call float @llvm.fma.f32(float %src, float %src, float %src)
+  ret float %result
+}
+
+; src0/1 are same SGPR
+define amdgpu_ps float @fma_ss_s(float inreg %src01, float inreg %src2) {
+  ; GFX9-LABEL: name: fma_ss_s
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY2]], [[COPY3]], [[COPY4]]
+  ; GFX9:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: fma_ss_s
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY2]], [[COPY3]], [[COPY4]]
+  ; GFX10:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %result = call float @llvm.fma.f32(float %src01, float %src01, float %src2)
+  ret float %result
+}
+
+; src1/2 are same SGPR
+define amdgpu_ps float @fma_s_ss(float inreg %src0, float inreg %src12) {
+  ; GFX9-LABEL: name: fma_s_ss
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY2]], [[COPY3]], [[COPY4]]
+  ; GFX9:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: fma_s_ss
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY2]], [[COPY3]], [[COPY4]]
+  ; GFX10:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %result = call float @llvm.fma.f32(float %src0, float %src12, float %src12)
+  ret float %result
+}
+
+; src0/2 are same SGPR
+define amdgpu_ps float @fma_ss_s_same_outer(float inreg %src02, float inreg %src1) {
+  ; GFX9-LABEL: name: fma_ss_s_same_outer
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY2]], [[COPY3]], [[COPY4]]
+  ; GFX9:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: fma_ss_s_same_outer
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY2]], [[COPY3]], [[COPY4]]
+  ; GFX10:   $vgpr0 = COPY [[FMA]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %result = call float @llvm.fma.f32(float %src02, float %src1, float %src02)
+  ret float %result
+}
+
+define amdgpu_ps float @fcmp_s_s(float inreg %src0, float inreg %src1) {
+  ; GFX9-LABEL: name: fcmp_s_s
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 1.000000e+00
+  ; GFX9:   [[C1:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 0.000000e+00
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[FCMP:%[0-9]+]]:vcc(s1) = G_FCMP floatpred(oeq), [[COPY]](s32), [[COPY2]]
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+  ; GFX9:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[C1]](s32)
+  ; GFX9:   [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[FCMP]](s1), [[COPY3]], [[COPY4]]
+  ; GFX9:   $vgpr0 = COPY [[SELECT]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: fcmp_s_s
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 1.000000e+00
+  ; GFX10:   [[C1:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 0.000000e+00
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[FCMP:%[0-9]+]]:vcc(s1) = G_FCMP floatpred(oeq), [[COPY]](s32), [[COPY2]]
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+  ; GFX10:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[C1]](s32)
+  ; GFX10:   [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[FCMP]](s1), [[COPY3]], [[COPY4]]
+  ; GFX10:   $vgpr0 = COPY [[SELECT]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %cmp = fcmp oeq float %src0, %src1
+  %result = select i1 %cmp, float 1.0, float 0.0
+  ret float %result
+}
+
+; Constant bus used by vcc
+define amdgpu_ps float @amdgcn_div_fmas_sss(float inreg %src, float %cmp.src) {
+  ; GFX9-LABEL: name: amdgcn_div_fmas_sss
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $vgpr0
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+  ; GFX9:   [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 0.000000e+00
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+  ; GFX9:   [[FCMP:%[0-9]+]]:vcc(s1) = G_FCMP floatpred(oeq), [[COPY1]](s32), [[COPY2]]
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[FCMP]](s1)
+  ; GFX9:   $vgpr0 = COPY [[INT]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: amdgcn_div_fmas_sss
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $vgpr0
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
+  ; GFX10:   [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 0.000000e+00
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+  ; GFX10:   [[FCMP:%[0-9]+]]:vcc(s1) = G_FCMP floatpred(oeq), [[COPY1]](s32), [[COPY2]]
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[FCMP]](s1)
+  ; GFX10:   $vgpr0 = COPY [[INT]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %vcc = fcmp oeq float %cmp.src, 0.0
+  %result = call float @llvm.amdgcn.div.fmas.f32(float %src, float %src, float %src, i1 %vcc)
+  ret float %result
+}
+
+define amdgpu_ps float @class_s_s(float inreg %src0, i32 inreg %src1) {
+  ; GFX9-LABEL: name: class_s_s
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 1.000000e+00
+  ; GFX9:   [[C1:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 0.000000e+00
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), [[COPY2]](s32), [[COPY3]](s32)
+  ; GFX9:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+  ; GFX9:   [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[C1]](s32)
+  ; GFX9:   [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[INT]](s1), [[COPY4]], [[COPY5]]
+  ; GFX9:   $vgpr0 = COPY [[SELECT]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: class_s_s
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[C:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 1.000000e+00
+  ; GFX10:   [[C1:%[0-9]+]]:sgpr(s32) = G_FCONSTANT float 0.000000e+00
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), [[COPY2]](s32), [[COPY3]](s32)
+  ; GFX10:   [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
+  ; GFX10:   [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[C1]](s32)
+  ; GFX10:   [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[INT]](s1), [[COPY4]], [[COPY5]]
+  ; GFX10:   $vgpr0 = COPY [[SELECT]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %class = call i1 @llvm.amdgcn.class.f32(float %src0, i32 %src1)
+  %result = select i1 %class, float 1.0, float 0.0
+  ret float %result
+}
+
+define amdgpu_ps float @div_scale_s_s_true(float inreg %src0, float inreg %src1) {
+  ; GFX9-LABEL: name: div_scale_s_s_true
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY2]](s32), [[COPY3]](s32), -1
+  ; GFX9:   $vgpr0 = COPY [[INT]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: div_scale_s_s_true
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY2]](s32), [[COPY3]](s32), -1
+  ; GFX10:   $vgpr0 = COPY [[INT]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %div.scale = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %src0, float %src1, i1 true)
+  %result = extractvalue { float, i1 } %div.scale, 0
+  ret float %result
+}
+
+define amdgpu_ps float @div_scale_s_s_false(float inreg %src0, float inreg %src1) {
+  ; GFX9-LABEL: name: div_scale_s_s_false
+  ; GFX9: bb.1 (%ir-block.0):
+  ; GFX9:   liveins: $sgpr2, $sgpr3
+  ; GFX9:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX9:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX9:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX9:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX9:   [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY2]](s32), [[COPY3]](s32), 0
+  ; GFX9:   $vgpr0 = COPY [[INT]](s32)
+  ; GFX9:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ; GFX10-LABEL: name: div_scale_s_s_false
+  ; GFX10: bb.1 (%ir-block.0):
+  ; GFX10:   liveins: $sgpr2, $sgpr3
+  ; GFX10:   [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
+  ; GFX10:   [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr3
+  ; GFX10:   [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+  ; GFX10:   [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+  ; GFX10:   [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY2]](s32), [[COPY3]](s32), 0
+  ; GFX10:   $vgpr0 = COPY [[INT]](s32)
+  ; GFX10:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %div.scale = call { float, i1 } @llvm.amdgcn.div.scale.f32(float %src0, float %src1, i1 false)
+  %result = extractvalue { float, i1 } %div.scale, 0
+  ret float %result
+}
+
+declare float @llvm.fma.f32(float, float, float) #0
+declare float @llvm.amdgcn.div.fmas.f32(float, float, float, i1) #1
+declare { float, i1 } @llvm.amdgcn.div.scale.f32(float, float, i1 immarg) #1
+declare i1 @llvm.amdgcn.class.f32(float, i32) #1
+
+attributes #0 = { nounwind readnone speculatable willreturn }
+attributes #1 = { nounwind readnone speculatable }

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.dec.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.dec.ll
index aca5378486de..32bef20c9ffd 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.dec.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.dec.ll
@@ -373,17 +373,19 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_addr64(i32 addrspace
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 2
 ; CI-NEXT:    v_mov_b32_e32 v4, 42
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v2, s3
-; CI-NEXT:    v_add_i32_e32 v3, vcc, s2, v0
-; CI-NEXT:    v_addc_u32_e32 v5, vcc, v2, v1, vcc
-; CI-NEXT:    v_add_i32_e32 v2, vcc, 20, v3
-; CI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v5, vcc
-; CI-NEXT:    flat_atomic_dec v2, v[2:3], v4 glc
+; CI-NEXT:    v_mov_b32_e32 v2, s2
+; CI-NEXT:    v_mov_b32_e32 v3, s3
+; CI-NEXT:    v_add_i32_e32 v2, vcc, v2, v0
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, v3, v1, vcc
+; CI-NEXT:    v_add_i32_e32 v2, vcc, 20, v2
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; CI-NEXT:    flat_atomic_dec v4, v[2:3], v4 glc
 ; CI-NEXT:    v_mov_b32_e32 v3, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; CI-NEXT:    flat_store_dword v[0:1], v2
+; CI-NEXT:    flat_store_dword v[0:1], v4
 ; CI-NEXT:    s_endpgm
 ;
 ; VI-LABEL: global_atomic_dec_ret_i32_offset_addr64:
@@ -393,17 +395,19 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_addr64(i32 addrspace
 ; VI-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
 ; VI-NEXT:    v_mov_b32_e32 v4, 42
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v2, s3
-; VI-NEXT:    v_add_u32_e32 v3, vcc, s2, v0
-; VI-NEXT:    v_addc_u32_e32 v5, vcc, v2, v1, vcc
-; VI-NEXT:    v_add_u32_e32 v2, vcc, 20, v3
-; VI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v5, vcc
-; VI-NEXT:    flat_atomic_dec v2, v[2:3], v4 glc
+; VI-NEXT:    v_mov_b32_e32 v2, s2
+; VI-NEXT:    v_mov_b32_e32 v3, s3
+; VI-NEXT:    v_add_u32_e32 v2, vcc, v2, v0
+; VI-NEXT:    v_addc_u32_e32 v3, vcc, v3, v1, vcc
+; VI-NEXT:    v_add_u32_e32 v2, vcc, 20, v2
+; VI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT:    flat_atomic_dec v4, v[2:3], v4 glc
 ; VI-NEXT:    v_mov_b32_e32 v3, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
+; VI-NEXT:    v_mov_b32_e32 v2, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v2, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; VI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; VI-NEXT:    flat_store_dword v[0:1], v2
+; VI-NEXT:    flat_store_dword v[0:1], v4
 ; VI-NEXT:    s_endpgm
 ; GFX9-LABEL: global_atomic_dec_ret_i32_offset_addr64:
 ; GFX9:       ; %bb.0:
@@ -444,14 +448,15 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32_offset_addr64(i32 addrspa
 ; CI-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; CI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 2
-; CI-NEXT:    v_mov_b32_e32 v2, 42
+; CI-NEXT:    v_mov_b32_e32 v4, 42
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
 ; CI-NEXT:    v_mov_b32_e32 v3, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    v_add_i32_e32 v0, vcc, 20, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; CI-NEXT:    flat_atomic_dec v0, v[0:1], v2 glc
+; CI-NEXT:    flat_atomic_dec v0, v[0:1], v4 glc
 ; CI-NEXT:    s_endpgm
 ;
 ; VI-LABEL: global_atomic_dec_noret_i32_offset_addr64:
@@ -459,14 +464,15 @@ define amdgpu_kernel void @global_atomic_dec_noret_i32_offset_addr64(i32 addrspa
 ; VI-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; VI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; VI-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
-; VI-NEXT:    v_mov_b32_e32 v2, 42
+; VI-NEXT:    v_mov_b32_e32 v4, 42
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
 ; VI-NEXT:    v_mov_b32_e32 v3, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
+; VI-NEXT:    v_mov_b32_e32 v2, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v2, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; VI-NEXT:    v_add_u32_e32 v0, vcc, 20, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; VI-NEXT:    flat_atomic_dec v0, v[0:1], v2 glc
+; VI-NEXT:    flat_atomic_dec v0, v[0:1], v4 glc
 ; VI-NEXT:    s_endpgm
 ; GFX9-LABEL: global_atomic_dec_noret_i32_offset_addr64:
 ; GFX9:       ; %bb.0:
@@ -674,17 +680,19 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32_offset_addr64(i32* %out, i32*
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 2
 ; CI-NEXT:    v_mov_b32_e32 v4, 42
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v2, s3
-; CI-NEXT:    v_add_i32_e32 v3, vcc, s2, v0
-; CI-NEXT:    v_addc_u32_e32 v5, vcc, v2, v1, vcc
-; CI-NEXT:    v_add_i32_e32 v2, vcc, 20, v3
-; CI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v5, vcc
-; CI-NEXT:    flat_atomic_dec v2, v[2:3], v4 glc
+; CI-NEXT:    v_mov_b32_e32 v2, s2
+; CI-NEXT:    v_mov_b32_e32 v3, s3
+; CI-NEXT:    v_add_i32_e32 v2, vcc, v2, v0
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, v3, v1, vcc
+; CI-NEXT:    v_add_i32_e32 v2, vcc, 20, v2
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; CI-NEXT:    flat_atomic_dec v4, v[2:3], v4 glc
 ; CI-NEXT:    v_mov_b32_e32 v3, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; CI-NEXT:    flat_store_dword v[0:1], v2
+; CI-NEXT:    flat_store_dword v[0:1], v4
 ; CI-NEXT:    s_endpgm
 ;
 ; VI-LABEL: flat_atomic_dec_ret_i32_offset_addr64:
@@ -694,17 +702,19 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i32_offset_addr64(i32* %out, i32*
 ; VI-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
 ; VI-NEXT:    v_mov_b32_e32 v4, 42
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v2, s3
-; VI-NEXT:    v_add_u32_e32 v3, vcc, s2, v0
-; VI-NEXT:    v_addc_u32_e32 v5, vcc, v2, v1, vcc
-; VI-NEXT:    v_add_u32_e32 v2, vcc, 20, v3
-; VI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v5, vcc
-; VI-NEXT:    flat_atomic_dec v2, v[2:3], v4 glc
+; VI-NEXT:    v_mov_b32_e32 v2, s2
+; VI-NEXT:    v_mov_b32_e32 v3, s3
+; VI-NEXT:    v_add_u32_e32 v2, vcc, v2, v0
+; VI-NEXT:    v_addc_u32_e32 v3, vcc, v3, v1, vcc
+; VI-NEXT:    v_add_u32_e32 v2, vcc, 20, v2
+; VI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT:    flat_atomic_dec v4, v[2:3], v4 glc
 ; VI-NEXT:    v_mov_b32_e32 v3, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
+; VI-NEXT:    v_mov_b32_e32 v2, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v2, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; VI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; VI-NEXT:    flat_store_dword v[0:1], v2
+; VI-NEXT:    flat_store_dword v[0:1], v4
 ; VI-NEXT:    s_endpgm
 ; GFX9-LABEL: flat_atomic_dec_ret_i32_offset_addr64:
 ; GFX9:       ; %bb.0:
@@ -745,14 +755,15 @@ define amdgpu_kernel void @flat_atomic_dec_noret_i32_offset_addr64(i32* %ptr) #0
 ; CI-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; CI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 2
-; CI-NEXT:    v_mov_b32_e32 v2, 42
+; CI-NEXT:    v_mov_b32_e32 v4, 42
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
 ; CI-NEXT:    v_mov_b32_e32 v3, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    v_add_i32_e32 v0, vcc, 20, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; CI-NEXT:    flat_atomic_dec v0, v[0:1], v2 glc
+; CI-NEXT:    flat_atomic_dec v0, v[0:1], v4 glc
 ; CI-NEXT:    s_endpgm
 ;
 ; VI-LABEL: flat_atomic_dec_noret_i32_offset_addr64:
@@ -760,14 +771,15 @@ define amdgpu_kernel void @flat_atomic_dec_noret_i32_offset_addr64(i32* %ptr) #0
 ; VI-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; VI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; VI-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
-; VI-NEXT:    v_mov_b32_e32 v2, 42
+; VI-NEXT:    v_mov_b32_e32 v4, 42
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
 ; VI-NEXT:    v_mov_b32_e32 v3, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
+; VI-NEXT:    v_mov_b32_e32 v2, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v2, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; VI-NEXT:    v_add_u32_e32 v0, vcc, 20, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; VI-NEXT:    flat_atomic_dec v0, v[0:1], v2 glc
+; VI-NEXT:    flat_atomic_dec v0, v[0:1], v4 glc
 ; VI-NEXT:    s_endpgm
 ; GFX9-LABEL: flat_atomic_dec_noret_i32_offset_addr64:
 ; GFX9:       ; %bb.0:
@@ -988,15 +1000,17 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i64_offset_addr64(i64* %out, i64*
 ; CI-NEXT:    v_mov_b32_e32 v2, 42
 ; CI-NEXT:    v_mov_b32_e32 v3, 0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v4, s3
-; CI-NEXT:    v_add_i32_e32 v5, vcc, s2, v0
-; CI-NEXT:    v_addc_u32_e32 v6, vcc, v4, v1, vcc
-; CI-NEXT:    v_add_i32_e32 v4, vcc, 40, v5
-; CI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v6, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s3
+; CI-NEXT:    v_mov_b32_e32 v4, s2
+; CI-NEXT:    v_add_i32_e32 v4, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v5, vcc, v5, v1, vcc
+; CI-NEXT:    v_add_i32_e32 v4, vcc, 40, v4
+; CI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v5, vcc
 ; CI-NEXT:    flat_atomic_dec_x2 v[2:3], v[4:5], v[2:3] glc
-; CI-NEXT:    v_mov_b32_e32 v4, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s1
+; CI-NEXT:    v_mov_b32_e32 v4, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; CI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; CI-NEXT:    s_endpgm
@@ -1009,15 +1023,17 @@ define amdgpu_kernel void @flat_atomic_dec_ret_i64_offset_addr64(i64* %out, i64*
 ; VI-NEXT:    v_mov_b32_e32 v2, 42
 ; VI-NEXT:    v_mov_b32_e32 v3, 0
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v4, s3
-; VI-NEXT:    v_add_u32_e32 v5, vcc, s2, v0
-; VI-NEXT:    v_addc_u32_e32 v6, vcc, v4, v1, vcc
-; VI-NEXT:    v_add_u32_e32 v4, vcc, 40, v5
-; VI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v6, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s3
+; VI-NEXT:    v_mov_b32_e32 v4, s2
+; VI-NEXT:    v_add_u32_e32 v4, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v5, vcc, v5, v1, vcc
+; VI-NEXT:    v_add_u32_e32 v4, vcc, 40, v4
+; VI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v5, vcc
 ; VI-NEXT:    flat_atomic_dec_x2 v[2:3], v[4:5], v[2:3] glc
-; VI-NEXT:    v_mov_b32_e32 v4, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
-; VI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s1
+; VI-NEXT:    v_mov_b32_e32 v4, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; VI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; VI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; VI-NEXT:    s_endpgm
@@ -1064,9 +1080,10 @@ define amdgpu_kernel void @flat_atomic_dec_noret_i64_offset_addr64(i64* %ptr) #0
 ; CI-NEXT:    v_mov_b32_e32 v2, 42
 ; CI-NEXT:    v_mov_b32_e32 v3, 0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v4, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s1
+; CI-NEXT:    v_mov_b32_e32 v4, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; CI-NEXT:    v_add_i32_e32 v0, vcc, 40, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; CI-NEXT:    flat_atomic_dec_x2 v[0:1], v[0:1], v[2:3] glc
@@ -1080,9 +1097,10 @@ define amdgpu_kernel void @flat_atomic_dec_noret_i64_offset_addr64(i64* %ptr) #0
 ; VI-NEXT:    v_mov_b32_e32 v2, 42
 ; VI-NEXT:    v_mov_b32_e32 v3, 0
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v4, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
-; VI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s1
+; VI-NEXT:    v_mov_b32_e32 v4, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; VI-NEXT:    v_add_u32_e32 v0, vcc, 40, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; VI-NEXT:    flat_atomic_dec_x2 v[0:1], v[0:1], v[2:3] glc
@@ -1553,15 +1571,17 @@ define amdgpu_kernel void @global_atomic_dec_ret_i64_offset_addr64(i64 addrspace
 ; CI-NEXT:    v_mov_b32_e32 v2, 42
 ; CI-NEXT:    v_mov_b32_e32 v3, 0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v4, s3
-; CI-NEXT:    v_add_i32_e32 v5, vcc, s2, v0
-; CI-NEXT:    v_addc_u32_e32 v6, vcc, v4, v1, vcc
-; CI-NEXT:    v_add_i32_e32 v4, vcc, 40, v5
-; CI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v6, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s3
+; CI-NEXT:    v_mov_b32_e32 v4, s2
+; CI-NEXT:    v_add_i32_e32 v4, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v5, vcc, v5, v1, vcc
+; CI-NEXT:    v_add_i32_e32 v4, vcc, 40, v4
+; CI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v5, vcc
 ; CI-NEXT:    flat_atomic_dec_x2 v[2:3], v[4:5], v[2:3] glc
-; CI-NEXT:    v_mov_b32_e32 v4, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s1
+; CI-NEXT:    v_mov_b32_e32 v4, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; CI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; CI-NEXT:    s_endpgm
@@ -1574,15 +1594,17 @@ define amdgpu_kernel void @global_atomic_dec_ret_i64_offset_addr64(i64 addrspace
 ; VI-NEXT:    v_mov_b32_e32 v2, 42
 ; VI-NEXT:    v_mov_b32_e32 v3, 0
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v4, s3
-; VI-NEXT:    v_add_u32_e32 v5, vcc, s2, v0
-; VI-NEXT:    v_addc_u32_e32 v6, vcc, v4, v1, vcc
-; VI-NEXT:    v_add_u32_e32 v4, vcc, 40, v5
-; VI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v6, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s3
+; VI-NEXT:    v_mov_b32_e32 v4, s2
+; VI-NEXT:    v_add_u32_e32 v4, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v5, vcc, v5, v1, vcc
+; VI-NEXT:    v_add_u32_e32 v4, vcc, 40, v4
+; VI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v5, vcc
 ; VI-NEXT:    flat_atomic_dec_x2 v[2:3], v[4:5], v[2:3] glc
-; VI-NEXT:    v_mov_b32_e32 v4, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
-; VI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s1
+; VI-NEXT:    v_mov_b32_e32 v4, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; VI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; VI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; VI-NEXT:    s_endpgm
@@ -1629,9 +1651,10 @@ define amdgpu_kernel void @global_atomic_dec_noret_i64_offset_addr64(i64 addrspa
 ; CI-NEXT:    v_mov_b32_e32 v2, 42
 ; CI-NEXT:    v_mov_b32_e32 v3, 0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v4, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s1
+; CI-NEXT:    v_mov_b32_e32 v4, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; CI-NEXT:    v_add_i32_e32 v0, vcc, 40, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; CI-NEXT:    flat_atomic_dec_x2 v[0:1], v[0:1], v[2:3] glc
@@ -1645,9 +1668,10 @@ define amdgpu_kernel void @global_atomic_dec_noret_i64_offset_addr64(i64 addrspa
 ; VI-NEXT:    v_mov_b32_e32 v2, 42
 ; VI-NEXT:    v_mov_b32_e32 v3, 0
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v4, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
-; VI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s1
+; VI-NEXT:    v_mov_b32_e32 v4, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; VI-NEXT:    v_add_u32_e32 v0, vcc, 40, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; VI-NEXT:    flat_atomic_dec_x2 v[0:1], v[0:1], v[2:3] glc

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.inc.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.inc.ll
index 453fbb385a35..1f42aa8a67bc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.inc.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.atomic.inc.ll
@@ -375,17 +375,19 @@ define amdgpu_kernel void @global_atomic_inc_ret_i32_offset_addr64(i32 addrspace
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 2
 ; CI-NEXT:    v_mov_b32_e32 v4, 42
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v2, s3
-; CI-NEXT:    v_add_i32_e32 v3, vcc, s2, v0
-; CI-NEXT:    v_addc_u32_e32 v5, vcc, v2, v1, vcc
-; CI-NEXT:    v_add_i32_e32 v2, vcc, 20, v3
-; CI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v5, vcc
-; CI-NEXT:    flat_atomic_inc v2, v[2:3], v4 glc
+; CI-NEXT:    v_mov_b32_e32 v2, s2
+; CI-NEXT:    v_mov_b32_e32 v3, s3
+; CI-NEXT:    v_add_i32_e32 v2, vcc, v2, v0
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, v3, v1, vcc
+; CI-NEXT:    v_add_i32_e32 v2, vcc, 20, v2
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; CI-NEXT:    flat_atomic_inc v4, v[2:3], v4 glc
 ; CI-NEXT:    v_mov_b32_e32 v3, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; CI-NEXT:    flat_store_dword v[0:1], v2
+; CI-NEXT:    flat_store_dword v[0:1], v4
 ; CI-NEXT:    s_endpgm
 ;
 ; VI-LABEL: global_atomic_inc_ret_i32_offset_addr64:
@@ -395,17 +397,19 @@ define amdgpu_kernel void @global_atomic_inc_ret_i32_offset_addr64(i32 addrspace
 ; VI-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
 ; VI-NEXT:    v_mov_b32_e32 v4, 42
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v2, s3
-; VI-NEXT:    v_add_u32_e32 v3, vcc, s2, v0
-; VI-NEXT:    v_addc_u32_e32 v5, vcc, v2, v1, vcc
-; VI-NEXT:    v_add_u32_e32 v2, vcc, 20, v3
-; VI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v5, vcc
-; VI-NEXT:    flat_atomic_inc v2, v[2:3], v4 glc
+; VI-NEXT:    v_mov_b32_e32 v2, s2
+; VI-NEXT:    v_mov_b32_e32 v3, s3
+; VI-NEXT:    v_add_u32_e32 v2, vcc, v2, v0
+; VI-NEXT:    v_addc_u32_e32 v3, vcc, v3, v1, vcc
+; VI-NEXT:    v_add_u32_e32 v2, vcc, 20, v2
+; VI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT:    flat_atomic_inc v4, v[2:3], v4 glc
 ; VI-NEXT:    v_mov_b32_e32 v3, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
+; VI-NEXT:    v_mov_b32_e32 v2, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v2, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; VI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; VI-NEXT:    flat_store_dword v[0:1], v2
+; VI-NEXT:    flat_store_dword v[0:1], v4
 ; VI-NEXT:    s_endpgm
 ;
 ; GFX9-LABEL: global_atomic_inc_ret_i32_offset_addr64:
@@ -415,17 +419,19 @@ define amdgpu_kernel void @global_atomic_inc_ret_i32_offset_addr64(i32 addrspace
 ; GFX9-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
 ; GFX9-NEXT:    v_mov_b32_e32 v4, 42
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v2, s3
-; GFX9-NEXT:    v_add_co_u32_e32 v3, vcc, s2, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v5, vcc, v2, v1, vcc
-; GFX9-NEXT:    v_add_co_u32_e32 v2, vcc, 20, v3
-; GFX9-NEXT:    v_addc_co_u32_e32 v3, vcc, 0, v5, vcc
-; GFX9-NEXT:    global_atomic_inc v2, v[2:3], v4, off glc
+; GFX9-NEXT:    v_mov_b32_e32 v2, s2
+; GFX9-NEXT:    v_mov_b32_e32 v3, s3
+; GFX9-NEXT:    v_add_co_u32_e32 v2, vcc, v2, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v3, vcc, v3, v1, vcc
+; GFX9-NEXT:    v_add_co_u32_e32 v2, vcc, 20, v2
+; GFX9-NEXT:    v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT:    global_atomic_inc v4, v[2:3], v4, off glc
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
+; GFX9-NEXT:    v_mov_b32_e32 v2, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v2, v0
 ; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v3, v1, vcc
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    global_store_dword v[0:1], v2, off
+; GFX9-NEXT:    global_store_dword v[0:1], v4, off
 ; GFX9-NEXT:    s_endpgm
   %id = call i32 @llvm.amdgcn.workitem.id.x()
   %gep.tid = getelementptr i32, i32 addrspace(1)* %ptr, i32 %id
@@ -442,14 +448,15 @@ define amdgpu_kernel void @global_atomic_inc_noret_i32_offset_addr64(i32 addrspa
 ; CI-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; CI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 2
-; CI-NEXT:    v_mov_b32_e32 v2, 42
+; CI-NEXT:    v_mov_b32_e32 v4, 42
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
 ; CI-NEXT:    v_mov_b32_e32 v3, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    v_add_i32_e32 v0, vcc, 20, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; CI-NEXT:    flat_atomic_inc v0, v[0:1], v2 glc
+; CI-NEXT:    flat_atomic_inc v0, v[0:1], v4 glc
 ; CI-NEXT:    s_endpgm
 ;
 ; VI-LABEL: global_atomic_inc_noret_i32_offset_addr64:
@@ -457,14 +464,15 @@ define amdgpu_kernel void @global_atomic_inc_noret_i32_offset_addr64(i32 addrspa
 ; VI-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; VI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; VI-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
-; VI-NEXT:    v_mov_b32_e32 v2, 42
+; VI-NEXT:    v_mov_b32_e32 v4, 42
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
 ; VI-NEXT:    v_mov_b32_e32 v3, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
+; VI-NEXT:    v_mov_b32_e32 v2, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v2, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; VI-NEXT:    v_add_u32_e32 v0, vcc, 20, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; VI-NEXT:    flat_atomic_inc v0, v[0:1], v2 glc
+; VI-NEXT:    flat_atomic_inc v0, v[0:1], v4 glc
 ; VI-NEXT:    s_endpgm
 ;
 ; GFX9-LABEL: global_atomic_inc_noret_i32_offset_addr64:
@@ -472,14 +480,15 @@ define amdgpu_kernel void @global_atomic_inc_noret_i32_offset_addr64(i32 addrspa
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; GFX9-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; GFX9-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v2, 42
+; GFX9-NEXT:    v_mov_b32_e32 v4, 42
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
+; GFX9-NEXT:    v_mov_b32_e32 v2, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v2, v0
 ; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v3, v1, vcc
 ; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, 20, v0
 ; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
-; GFX9-NEXT:    global_atomic_inc v0, v[0:1], v2, off glc
+; GFX9-NEXT:    global_atomic_inc v0, v[0:1], v4, off glc
 ; GFX9-NEXT:    s_endpgm
   %id = call i32 @llvm.amdgcn.workitem.id.x()
   %gep.tid = getelementptr i32, i32 addrspace(1)* %ptr, i32 %id
@@ -936,15 +945,17 @@ define amdgpu_kernel void @global_atomic_inc_ret_i64_offset_addr64(i64 addrspace
 ; CI-NEXT:    v_mov_b32_e32 v2, 42
 ; CI-NEXT:    v_mov_b32_e32 v3, 0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v4, s3
-; CI-NEXT:    v_add_i32_e32 v5, vcc, s2, v0
-; CI-NEXT:    v_addc_u32_e32 v6, vcc, v4, v1, vcc
-; CI-NEXT:    v_add_i32_e32 v4, vcc, 40, v5
-; CI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v6, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s3
+; CI-NEXT:    v_mov_b32_e32 v4, s2
+; CI-NEXT:    v_add_i32_e32 v4, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v5, vcc, v5, v1, vcc
+; CI-NEXT:    v_add_i32_e32 v4, vcc, 40, v4
+; CI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v5, vcc
 ; CI-NEXT:    flat_atomic_inc_x2 v[2:3], v[4:5], v[2:3] glc
-; CI-NEXT:    v_mov_b32_e32 v4, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s1
+; CI-NEXT:    v_mov_b32_e32 v4, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; CI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; CI-NEXT:    s_endpgm
@@ -957,15 +968,17 @@ define amdgpu_kernel void @global_atomic_inc_ret_i64_offset_addr64(i64 addrspace
 ; VI-NEXT:    v_mov_b32_e32 v2, 42
 ; VI-NEXT:    v_mov_b32_e32 v3, 0
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v4, s3
-; VI-NEXT:    v_add_u32_e32 v5, vcc, s2, v0
-; VI-NEXT:    v_addc_u32_e32 v6, vcc, v4, v1, vcc
-; VI-NEXT:    v_add_u32_e32 v4, vcc, 40, v5
-; VI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v6, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s3
+; VI-NEXT:    v_mov_b32_e32 v4, s2
+; VI-NEXT:    v_add_u32_e32 v4, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v5, vcc, v5, v1, vcc
+; VI-NEXT:    v_add_u32_e32 v4, vcc, 40, v4
+; VI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v5, vcc
 ; VI-NEXT:    flat_atomic_inc_x2 v[2:3], v[4:5], v[2:3] glc
-; VI-NEXT:    v_mov_b32_e32 v4, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
-; VI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s1
+; VI-NEXT:    v_mov_b32_e32 v4, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; VI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; VI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; VI-NEXT:    s_endpgm
@@ -978,15 +991,17 @@ define amdgpu_kernel void @global_atomic_inc_ret_i64_offset_addr64(i64 addrspace
 ; GFX9-NEXT:    v_mov_b32_e32 v2, 42
 ; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v4, s3
-; GFX9-NEXT:    v_add_co_u32_e32 v5, vcc, s2, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v6, vcc, v4, v1, vcc
-; GFX9-NEXT:    v_add_co_u32_e32 v4, vcc, 40, v5
-; GFX9-NEXT:    v_addc_co_u32_e32 v5, vcc, 0, v6, vcc
+; GFX9-NEXT:    v_mov_b32_e32 v5, s3
+; GFX9-NEXT:    v_mov_b32_e32 v4, s2
+; GFX9-NEXT:    v_add_co_u32_e32 v4, vcc, v4, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v5, vcc, v5, v1, vcc
+; GFX9-NEXT:    v_add_co_u32_e32 v4, vcc, 40, v4
+; GFX9-NEXT:    v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
 ; GFX9-NEXT:    global_atomic_inc_x2 v[2:3], v[4:5], v[2:3], off glc
-; GFX9-NEXT:    v_mov_b32_e32 v4, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v4, v1, vcc
+; GFX9-NEXT:    v_mov_b32_e32 v5, s1
+; GFX9-NEXT:    v_mov_b32_e32 v4, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v4, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v5, v1, vcc
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    global_store_dwordx2 v[0:1], v[2:3], off
 ; GFX9-NEXT:    s_endpgm
@@ -1008,9 +1023,10 @@ define amdgpu_kernel void @global_atomic_inc_noret_i64_offset_addr64(i64 addrspa
 ; CI-NEXT:    v_mov_b32_e32 v2, 42
 ; CI-NEXT:    v_mov_b32_e32 v3, 0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v4, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s1
+; CI-NEXT:    v_mov_b32_e32 v4, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; CI-NEXT:    v_add_i32_e32 v0, vcc, 40, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; CI-NEXT:    flat_atomic_inc_x2 v[0:1], v[0:1], v[2:3] glc
@@ -1024,9 +1040,10 @@ define amdgpu_kernel void @global_atomic_inc_noret_i64_offset_addr64(i64 addrspa
 ; VI-NEXT:    v_mov_b32_e32 v2, 42
 ; VI-NEXT:    v_mov_b32_e32 v3, 0
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v4, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
-; VI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s1
+; VI-NEXT:    v_mov_b32_e32 v4, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; VI-NEXT:    v_add_u32_e32 v0, vcc, 40, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; VI-NEXT:    flat_atomic_inc_x2 v[0:1], v[0:1], v[2:3] glc
@@ -1040,9 +1057,10 @@ define amdgpu_kernel void @global_atomic_inc_noret_i64_offset_addr64(i64 addrspa
 ; GFX9-NEXT:    v_mov_b32_e32 v2, 42
 ; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v4, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v4, v1, vcc
+; GFX9-NEXT:    v_mov_b32_e32 v5, s1
+; GFX9-NEXT:    v_mov_b32_e32 v4, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v4, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v5, v1, vcc
 ; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, 40, v0
 ; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
 ; GFX9-NEXT:    global_atomic_inc_x2 v[0:1], v[0:1], v[2:3], off glc
@@ -1134,17 +1152,19 @@ define amdgpu_kernel void @flat_atomic_inc_ret_i32_offset_addr64(i32* %out, i32*
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 2
 ; CI-NEXT:    v_mov_b32_e32 v4, 42
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v2, s3
-; CI-NEXT:    v_add_i32_e32 v3, vcc, s2, v0
-; CI-NEXT:    v_addc_u32_e32 v5, vcc, v2, v1, vcc
-; CI-NEXT:    v_add_i32_e32 v2, vcc, 20, v3
-; CI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v5, vcc
-; CI-NEXT:    flat_atomic_inc v2, v[2:3], v4 glc
+; CI-NEXT:    v_mov_b32_e32 v2, s2
+; CI-NEXT:    v_mov_b32_e32 v3, s3
+; CI-NEXT:    v_add_i32_e32 v2, vcc, v2, v0
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, v3, v1, vcc
+; CI-NEXT:    v_add_i32_e32 v2, vcc, 20, v2
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; CI-NEXT:    flat_atomic_inc v4, v[2:3], v4 glc
 ; CI-NEXT:    v_mov_b32_e32 v3, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; CI-NEXT:    flat_store_dword v[0:1], v2
+; CI-NEXT:    flat_store_dword v[0:1], v4
 ; CI-NEXT:    s_endpgm
 ;
 ; VI-LABEL: flat_atomic_inc_ret_i32_offset_addr64:
@@ -1154,17 +1174,19 @@ define amdgpu_kernel void @flat_atomic_inc_ret_i32_offset_addr64(i32* %out, i32*
 ; VI-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
 ; VI-NEXT:    v_mov_b32_e32 v4, 42
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v2, s3
-; VI-NEXT:    v_add_u32_e32 v3, vcc, s2, v0
-; VI-NEXT:    v_addc_u32_e32 v5, vcc, v2, v1, vcc
-; VI-NEXT:    v_add_u32_e32 v2, vcc, 20, v3
-; VI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v5, vcc
-; VI-NEXT:    flat_atomic_inc v2, v[2:3], v4 glc
+; VI-NEXT:    v_mov_b32_e32 v2, s2
+; VI-NEXT:    v_mov_b32_e32 v3, s3
+; VI-NEXT:    v_add_u32_e32 v2, vcc, v2, v0
+; VI-NEXT:    v_addc_u32_e32 v3, vcc, v3, v1, vcc
+; VI-NEXT:    v_add_u32_e32 v2, vcc, 20, v2
+; VI-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; VI-NEXT:    flat_atomic_inc v4, v[2:3], v4 glc
 ; VI-NEXT:    v_mov_b32_e32 v3, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
+; VI-NEXT:    v_mov_b32_e32 v2, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v2, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; VI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; VI-NEXT:    flat_store_dword v[0:1], v2
+; VI-NEXT:    flat_store_dword v[0:1], v4
 ; VI-NEXT:    s_endpgm
 ;
 ; GFX9-LABEL: flat_atomic_inc_ret_i32_offset_addr64:
@@ -1174,17 +1196,19 @@ define amdgpu_kernel void @flat_atomic_inc_ret_i32_offset_addr64(i32* %out, i32*
 ; GFX9-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
 ; GFX9-NEXT:    v_mov_b32_e32 v4, 42
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v2, s3
-; GFX9-NEXT:    v_add_co_u32_e32 v3, vcc, s2, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v5, vcc, v2, v1, vcc
-; GFX9-NEXT:    v_add_co_u32_e32 v2, vcc, 20, v3
-; GFX9-NEXT:    v_addc_co_u32_e32 v3, vcc, 0, v5, vcc
-; GFX9-NEXT:    flat_atomic_inc v2, v[2:3], v4 glc
+; GFX9-NEXT:    v_mov_b32_e32 v2, s2
+; GFX9-NEXT:    v_mov_b32_e32 v3, s3
+; GFX9-NEXT:    v_add_co_u32_e32 v2, vcc, v2, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v3, vcc, v3, v1, vcc
+; GFX9-NEXT:    v_add_co_u32_e32 v2, vcc, 20, v2
+; GFX9-NEXT:    v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
+; GFX9-NEXT:    flat_atomic_inc v4, v[2:3], v4 glc
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
+; GFX9-NEXT:    v_mov_b32_e32 v2, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v2, v0
 ; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v3, v1, vcc
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    flat_store_dword v[0:1], v2
+; GFX9-NEXT:    flat_store_dword v[0:1], v4
 ; GFX9-NEXT:    s_endpgm
   %id = call i32 @llvm.amdgcn.workitem.id.x()
   %gep.tid = getelementptr i32, i32* %ptr, i32 %id
@@ -1201,14 +1225,15 @@ define amdgpu_kernel void @flat_atomic_inc_noret_i32_offset_addr64(i32* %ptr) #0
 ; CI-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; CI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 2
-; CI-NEXT:    v_mov_b32_e32 v2, 42
+; CI-NEXT:    v_mov_b32_e32 v4, 42
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
 ; CI-NEXT:    v_mov_b32_e32 v3, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    v_add_i32_e32 v0, vcc, 20, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; CI-NEXT:    flat_atomic_inc v0, v[0:1], v2 glc
+; CI-NEXT:    flat_atomic_inc v0, v[0:1], v4 glc
 ; CI-NEXT:    s_endpgm
 ;
 ; VI-LABEL: flat_atomic_inc_noret_i32_offset_addr64:
@@ -1216,14 +1241,15 @@ define amdgpu_kernel void @flat_atomic_inc_noret_i32_offset_addr64(i32* %ptr) #0
 ; VI-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; VI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; VI-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
-; VI-NEXT:    v_mov_b32_e32 v2, 42
+; VI-NEXT:    v_mov_b32_e32 v4, 42
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
 ; VI-NEXT:    v_mov_b32_e32 v3, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
+; VI-NEXT:    v_mov_b32_e32 v2, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v2, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; VI-NEXT:    v_add_u32_e32 v0, vcc, 20, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; VI-NEXT:    flat_atomic_inc v0, v[0:1], v2 glc
+; VI-NEXT:    flat_atomic_inc v0, v[0:1], v4 glc
 ; VI-NEXT:    s_endpgm
 ;
 ; GFX9-LABEL: flat_atomic_inc_noret_i32_offset_addr64:
@@ -1231,14 +1257,15 @@ define amdgpu_kernel void @flat_atomic_inc_noret_i32_offset_addr64(i32* %ptr) #0
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x0
 ; GFX9-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; GFX9-NEXT:    v_lshlrev_b64 v[0:1], 2, v[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v2, 42
+; GFX9-NEXT:    v_mov_b32_e32 v4, 42
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
+; GFX9-NEXT:    v_mov_b32_e32 v2, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v2, v0
 ; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v3, v1, vcc
 ; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, 20, v0
 ; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
-; GFX9-NEXT:    flat_atomic_inc v0, v[0:1], v2 glc
+; GFX9-NEXT:    flat_atomic_inc v0, v[0:1], v4 glc
 ; GFX9-NEXT:    s_endpgm
   %id = call i32 @llvm.amdgcn.workitem.id.x()
   %gep.tid = getelementptr i32, i32* %ptr, i32 %id
@@ -1402,15 +1429,17 @@ define amdgpu_kernel void @flat_atomic_inc_ret_i64_offset_addr64(i64* %out, i64*
 ; CI-NEXT:    v_mov_b32_e32 v2, 42
 ; CI-NEXT:    v_mov_b32_e32 v3, 0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v4, s3
-; CI-NEXT:    v_add_i32_e32 v5, vcc, s2, v0
-; CI-NEXT:    v_addc_u32_e32 v6, vcc, v4, v1, vcc
-; CI-NEXT:    v_add_i32_e32 v4, vcc, 40, v5
-; CI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v6, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s3
+; CI-NEXT:    v_mov_b32_e32 v4, s2
+; CI-NEXT:    v_add_i32_e32 v4, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v5, vcc, v5, v1, vcc
+; CI-NEXT:    v_add_i32_e32 v4, vcc, 40, v4
+; CI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v5, vcc
 ; CI-NEXT:    flat_atomic_inc_x2 v[2:3], v[4:5], v[2:3] glc
-; CI-NEXT:    v_mov_b32_e32 v4, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s1
+; CI-NEXT:    v_mov_b32_e32 v4, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; CI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; CI-NEXT:    s_endpgm
@@ -1423,15 +1452,17 @@ define amdgpu_kernel void @flat_atomic_inc_ret_i64_offset_addr64(i64* %out, i64*
 ; VI-NEXT:    v_mov_b32_e32 v2, 42
 ; VI-NEXT:    v_mov_b32_e32 v3, 0
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v4, s3
-; VI-NEXT:    v_add_u32_e32 v5, vcc, s2, v0
-; VI-NEXT:    v_addc_u32_e32 v6, vcc, v4, v1, vcc
-; VI-NEXT:    v_add_u32_e32 v4, vcc, 40, v5
-; VI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v6, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s3
+; VI-NEXT:    v_mov_b32_e32 v4, s2
+; VI-NEXT:    v_add_u32_e32 v4, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v5, vcc, v5, v1, vcc
+; VI-NEXT:    v_add_u32_e32 v4, vcc, 40, v4
+; VI-NEXT:    v_addc_u32_e32 v5, vcc, 0, v5, vcc
 ; VI-NEXT:    flat_atomic_inc_x2 v[2:3], v[4:5], v[2:3] glc
-; VI-NEXT:    v_mov_b32_e32 v4, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
-; VI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s1
+; VI-NEXT:    v_mov_b32_e32 v4, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; VI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; VI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; VI-NEXT:    s_endpgm
@@ -1444,15 +1475,17 @@ define amdgpu_kernel void @flat_atomic_inc_ret_i64_offset_addr64(i64* %out, i64*
 ; GFX9-NEXT:    v_mov_b32_e32 v2, 42
 ; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v4, s3
-; GFX9-NEXT:    v_add_co_u32_e32 v5, vcc, s2, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v6, vcc, v4, v1, vcc
-; GFX9-NEXT:    v_add_co_u32_e32 v4, vcc, 40, v5
-; GFX9-NEXT:    v_addc_co_u32_e32 v5, vcc, 0, v6, vcc
+; GFX9-NEXT:    v_mov_b32_e32 v5, s3
+; GFX9-NEXT:    v_mov_b32_e32 v4, s2
+; GFX9-NEXT:    v_add_co_u32_e32 v4, vcc, v4, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v5, vcc, v5, v1, vcc
+; GFX9-NEXT:    v_add_co_u32_e32 v4, vcc, 40, v4
+; GFX9-NEXT:    v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
 ; GFX9-NEXT:    flat_atomic_inc_x2 v[2:3], v[4:5], v[2:3] glc
-; GFX9-NEXT:    v_mov_b32_e32 v4, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v4, v1, vcc
+; GFX9-NEXT:    v_mov_b32_e32 v5, s1
+; GFX9-NEXT:    v_mov_b32_e32 v4, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v4, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v5, v1, vcc
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; GFX9-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; GFX9-NEXT:    s_endpgm
@@ -1474,9 +1507,10 @@ define amdgpu_kernel void @flat_atomic_inc_noret_i64_offset_addr64(i64* %ptr) #0
 ; CI-NEXT:    v_mov_b32_e32 v2, 42
 ; CI-NEXT:    v_mov_b32_e32 v3, 0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_mov_b32_e32 v4, s1
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v5, s1
+; CI-NEXT:    v_mov_b32_e32 v4, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v4, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; CI-NEXT:    v_add_i32_e32 v0, vcc, 40, v0
 ; CI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; CI-NEXT:    flat_atomic_inc_x2 v[0:1], v[0:1], v[2:3] glc
@@ -1490,9 +1524,10 @@ define amdgpu_kernel void @flat_atomic_inc_noret_i64_offset_addr64(i64* %ptr) #0
 ; VI-NEXT:    v_mov_b32_e32 v2, 42
 ; VI-NEXT:    v_mov_b32_e32 v3, 0
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    v_mov_b32_e32 v4, s1
-; VI-NEXT:    v_add_u32_e32 v0, vcc, s0, v0
-; VI-NEXT:    v_addc_u32_e32 v1, vcc, v4, v1, vcc
+; VI-NEXT:    v_mov_b32_e32 v5, s1
+; VI-NEXT:    v_mov_b32_e32 v4, s0
+; VI-NEXT:    v_add_u32_e32 v0, vcc, v4, v0
+; VI-NEXT:    v_addc_u32_e32 v1, vcc, v5, v1, vcc
 ; VI-NEXT:    v_add_u32_e32 v0, vcc, 40, v0
 ; VI-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; VI-NEXT:    flat_atomic_inc_x2 v[0:1], v[0:1], v[2:3] glc
@@ -1506,9 +1541,10 @@ define amdgpu_kernel void @flat_atomic_inc_noret_i64_offset_addr64(i64* %ptr) #0
 ; GFX9-NEXT:    v_mov_b32_e32 v2, 42
 ; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v4, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v4, v1, vcc
+; GFX9-NEXT:    v_mov_b32_e32 v5, s1
+; GFX9-NEXT:    v_mov_b32_e32 v4, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v4, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v5, v1, vcc
 ; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, 40, v0
 ; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
 ; GFX9-NEXT:    flat_atomic_inc_x2 v[0:1], v[0:1], v[2:3] glc

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll
index defd9522502a..3dedbec19679 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll
@@ -11,9 +11,10 @@ define amdgpu_kernel void @is_private_vgpr(i8* addrspace(1)* %ptr.ptr) {
 ; CI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 3
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_mov_b32_e32 v2, s1
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v2, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v3, s1
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    flat_load_dwordx2 v[0:1], v[0:1]
 ; CI-NEXT:    s_load_dword s0, s[4:5], 0x11
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -28,9 +29,10 @@ define amdgpu_kernel void @is_private_vgpr(i8* addrspace(1)* %ptr.ptr) {
 ; GFX9-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; GFX9-NEXT:    v_lshlrev_b64 v[0:1], 3, v[0:1]
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v2, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
+; GFX9-NEXT:    v_mov_b32_e32 v3, s1
+; GFX9-NEXT:    v_mov_b32_e32 v2, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v2, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v3, v1, vcc
 ; GFX9-NEXT:    global_load_dwordx2 v[0:1], v[0:1], off
 ; GFX9-NEXT:    s_getreg_b32 s0, hwreg(HW_REG_SH_MEM_BASES, 0, 16)
 ; GFX9-NEXT:    s_lshl_b32 s0, s0, 16

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll
index 16e03733c782..f0eb57cef219 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll
@@ -11,9 +11,10 @@ define amdgpu_kernel void @is_local_vgpr(i8* addrspace(1)* %ptr.ptr) {
 ; CI-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; CI-NEXT:    v_lshl_b64 v[0:1], v[0:1], 3
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_add_i32_e32 v0, vcc, s0, v0
-; CI-NEXT:    v_mov_b32_e32 v2, s1
-; CI-NEXT:    v_addc_u32_e32 v1, vcc, v2, v1, vcc
+; CI-NEXT:    v_mov_b32_e32 v3, s1
+; CI-NEXT:    v_mov_b32_e32 v2, s0
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v2, v0
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v3, v1, vcc
 ; CI-NEXT:    flat_load_dwordx2 v[0:1], v[0:1]
 ; CI-NEXT:    s_load_dword s0, s[4:5], 0x10
 ; CI-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -28,9 +29,10 @@ define amdgpu_kernel void @is_local_vgpr(i8* addrspace(1)* %ptr.ptr) {
 ; GFX9-NEXT:    v_ashrrev_i32_e32 v1, 31, v0
 ; GFX9-NEXT:    v_lshlrev_b64 v[0:1], 3, v[0:1]
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v2, s1
-; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, s0, v0
-; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
+; GFX9-NEXT:    v_mov_b32_e32 v3, s1
+; GFX9-NEXT:    v_mov_b32_e32 v2, s0
+; GFX9-NEXT:    v_add_co_u32_e32 v0, vcc, v2, v0
+; GFX9-NEXT:    v_addc_co_u32_e32 v1, vcc, v3, v1, vcc
 ; GFX9-NEXT:    global_load_dwordx2 v[0:1], v[0:1], off
 ; GFX9-NEXT:    s_getreg_b32 s0, hwreg(HW_REG_SH_MEM_BASES, 16, 16)
 ; GFX9-NEXT:    s_lshl_b32 s0, s0, 16

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-add.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-add.mir
index fd4f1ec4f4f8..8209bc40483e 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-add.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-add.mir
@@ -27,7 +27,8 @@ body: |
     ; CHECK-LABEL: name: add_s32_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[ADD:%[0-9]+]]:vgpr(s32) = G_ADD [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[ADD:%[0-9]+]]:vgpr(s32) = G_ADD [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_ADD %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.class.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.class.mir
index 6b832e4c46e7..4db1fea18388 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.class.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.class.mir
@@ -12,7 +12,9 @@ body: |
     ; CHECK-LABEL: name: class_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s64) = COPY $sgpr0_sgpr1
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
-    ; CHECK: [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), [[COPY]](s64), [[COPY1]](s32)
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s64) = COPY [[COPY]](s64)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), [[COPY2]](s64), [[COPY3]](s32)
     %0:_(s64) = COPY $sgpr0_sgpr1
     %1:_(s32) = COPY $sgpr2
     %2:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), %0, %1
@@ -29,7 +31,8 @@ body: |
     ; CHECK-LABEL: name: class_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s64) = COPY $sgpr0_sgpr1
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), [[COPY]](s64), [[COPY1]](s32)
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s64) = COPY [[COPY]](s64)
+    ; CHECK: [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), [[COPY2]](s64), [[COPY1]](s32)
     %0:_(s64) = COPY $sgpr0_sgpr1
     %1:_(s32) = COPY $vgpr0
     %2:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), %0, %1
@@ -45,7 +48,8 @@ body: |
     ; CHECK-LABEL: name: class_vs
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), [[COPY]](s64), [[COPY1]](s32)
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), [[COPY]](s64), [[COPY2]](s32)
     %0:_(s64) = COPY $vgpr0_vgpr1
     %1:_(s32) = COPY $sgpr0
     %2:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.class), %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.cvt.pkrtz.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.cvt.pkrtz.mir
index 4077bcb5bdb4..b395966b4abb 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.cvt.pkrtz.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.cvt.pkrtz.mir
@@ -12,8 +12,9 @@ body: |
     ; CHECK-LABEL: name: cvt_pkrtz_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.cvt.pkrtz), [[COPY]](s32), [[COPY2]](s32)
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.cvt.pkrtz), [[COPY2]](s32), [[COPY3]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.cvt.pkrtz), %0, %1
@@ -28,7 +29,8 @@ body: |
     ; CHECK-LABEL: name: cvt_pkrtz_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.cvt.pkrtz), [[COPY]](s32), [[COPY1]](s32)
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.cvt.pkrtz), [[COPY2]](s32), [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.cvt.pkrtz), %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.fmas.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.fmas.mir
index 87dc8f5466af..c87fc324e787 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.fmas.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.fmas.mir
@@ -17,10 +17,11 @@ body: |
     ; CHECK: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 0
     ; CHECK: [[ICMP:%[0-9]+]]:sgpr(s32) = G_ICMP intpred(eq), [[COPY3]](s32), [[C]]
     ; CHECK: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[ICMP]](s32)
-    ; CHECK: [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
-    ; CHECK: [[COPY6:%[0-9]+]]:vcc(s1) = COPY [[TRUNC]](s1)
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[COPY]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s1)
+    ; CHECK: [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[COPY6:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
+    ; CHECK: [[COPY7:%[0-9]+]]:vcc(s1) = COPY [[TRUNC]](s1)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s1)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = COPY $sgpr2
@@ -44,9 +45,10 @@ body: |
     ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 0
     ; CHECK: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(eq), [[COPY3]](s32), [[C]]
-    ; CHECK: [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[COPY]](s32), [[COPY4]](s32), [[COPY5]](s32), [[ICMP]](s1)
+    ; CHECK: [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[COPY6:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[ICMP]](s1)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = COPY $sgpr2

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.scale.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.scale.mir
index 60e1c8f5cd13..284e0415b726 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.scale.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.scale.mir
@@ -12,7 +12,9 @@ body: |
     ; CHECK-LABEL: name: div_scale_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s32), [[COPY1]](s32), 0
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY2]](s32), [[COPY3]](s32), 0
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32), %3:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), %0, %1, 0
@@ -28,7 +30,8 @@ body: |
     ; CHECK-LABEL: name: div_scale_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s32), [[COPY1]](s32), 0
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY2]](s32), [[COPY1]](s32), 0
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32), %3:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), %0, %1, 0
@@ -44,7 +47,8 @@ body: |
     ; CHECK-LABEL: name: div_scale_vs
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s32), [[COPY1]](s32), 0
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32), [[INT1:%[0-9]+]]:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s32), [[COPY2]](s32), 0
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s32), %3:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), %0, %1, 0

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fcmp.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fcmp.mir
index 0e5dd439e608..8cee284db308 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fcmp.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fcmp.mir
@@ -12,7 +12,9 @@ body: |
     ; CHECK-LABEL: name: fcmp_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), [[COPY]](s32), [[COPY1]](s32), 1
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), [[COPY2]](s32), [[COPY3]](s32), 1
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), %0, %1, 1
@@ -28,7 +30,8 @@ body: |
     ; CHECK-LABEL: name: fcmp_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), [[COPY]](s32), [[COPY1]](s32), 1
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), [[COPY2]](s32), [[COPY1]](s32), 1
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), %0, %1, 1
@@ -44,7 +47,8 @@ body: |
     ; CHECK-LABEL: name: fcmp_vs
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), [[COPY]](s32), [[COPY1]](s32), 1
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), [[COPY]](s32), [[COPY2]](s32), 1
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.fcmp), %0, %1, 1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fmul.legacy.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fmul.legacy.mir
index d80bf9a22ca5..754394420b54 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fmul.legacy.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.fmul.legacy.mir
@@ -12,8 +12,9 @@ body: |
     ; CHECK-LABEL: name: fmul_legacy_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmul.legacy), [[COPY]](s32), [[COPY2]](s32)
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmul.legacy), [[COPY2]](s32), [[COPY3]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmul.legacy), %0, %1
@@ -28,7 +29,8 @@ body: |
     ; CHECK-LABEL: name: fmul_legacy_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmul.legacy), [[COPY]](s32), [[COPY1]](s32)
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmul.legacy), [[COPY2]](s32), [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmul.legacy), %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.icmp.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.icmp.mir
index 393903697991..f2cfdf67b849 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.icmp.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.icmp.mir
@@ -12,7 +12,9 @@ body: |
     ; CHECK-LABEL: name: icmp_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), [[COPY]](s32), [[COPY1]](s32), 32
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), [[COPY2]](s32), [[COPY3]](s32), 32
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), %0, %1, 32
@@ -28,7 +30,8 @@ body: |
     ; CHECK-LABEL: name: icmp_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), [[COPY]](s32), [[COPY1]](s32), 32
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), [[COPY2]](s32), [[COPY1]](s32), 32
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), %0, %1, 32
@@ -44,7 +47,8 @@ body: |
     ; CHECK-LABEL: name: icmp_vs
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), [[COPY]](s32), [[COPY1]](s32), 32
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[INT:%[0-9]+]]:sgpr(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), [[COPY]](s32), [[COPY2]](s32), 32
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), %0, %1, 32

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgpu-ffbh-u32.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgpu-ffbh-u32.mir
index a766df2a3d00..96a9383f0368 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgpu-ffbh-u32.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgpu-ffbh-u32.mir
@@ -12,7 +12,8 @@ body: |
 
     ; CHECK-LABEL: name: ffbh_u32_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[AMDGPU_FFBH_U32_:%[0-9]+]]:vgpr(s32) = G_AMDGPU_FFBH_U32 [[COPY]](s32)
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[AMDGPU_FFBH_U32_:%[0-9]+]]:vgpr(s32) = G_AMDGPU_FFBH_U32 [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_AMDGPU_FFBH_U32 %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-and.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-and.mir
index fbfadad6c55d..bd0cde1bbccc 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-and.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-and.mir
@@ -28,7 +28,8 @@ body: |
     ; CHECK-LABEL: name: and_s32_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[AND:%[0-9]+]]:vgpr(s32) = G_AND [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[AND:%[0-9]+]]:vgpr(s32) = G_AND [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_AND %0, %1
@@ -555,7 +556,8 @@ body: |
     ; CHECK-LABEL: name: and_v2s16_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(<2 x s16>) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(<2 x s16>) = COPY $vgpr0
-    ; CHECK: [[AND:%[0-9]+]]:vgpr(<2 x s16>) = G_AND [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(<2 x s16>) = COPY [[COPY]](<2 x s16>)
+    ; CHECK: [[AND:%[0-9]+]]:vgpr(<2 x s16>) = G_AND [[COPY2]], [[COPY1]]
     %0:_(<2 x s16>) = COPY $sgpr0
     %1:_(<2 x s16>) = COPY $vgpr0
     %2:_(<2 x s16>) = G_AND %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ashr.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ashr.mir
index f66b6f95ab00..c75ad93ad11b 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ashr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ashr.mir
@@ -12,7 +12,7 @@ body: |
     ; CHECK-LABEL: name: ashr_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[ASHR:%[0-9]+]]:sgpr(s32) = G_ASHR [[COPY]], [[COPY1]]
+    ; CHECK: [[ASHR:%[0-9]+]]:sgpr(s32) = G_ASHR [[COPY]], [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = G_ASHR %0, %1
@@ -28,7 +28,8 @@ body: |
     ; CHECK-LABEL: name: ashr_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY2]], [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_ASHR %0, %1
@@ -45,7 +46,7 @@ body: |
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY]], [[COPY2]]
+    ; CHECK: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY]], [[COPY2]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s32) = G_ASHR %0, %1
@@ -61,7 +62,7 @@ body: |
     ; CHECK-LABEL: name: ashr_vv
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
-    ; CHECK: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY]], [[COPY1]]
+    ; CHECK: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY]], [[COPY1]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $vgpr1
     %2:_(s32) = G_ASHR %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fadd.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fadd.mir
index 017d652ce343..db136a8ecacb 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fadd.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fadd.mir
@@ -12,8 +12,9 @@ body: |
     ; CHECK-LABEL: name: fadd_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[FADD:%[0-9]+]]:vgpr(s32) = G_FADD [[COPY]], [[COPY2]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[FADD:%[0-9]+]]:vgpr(s32) = G_FADD [[COPY2]], [[COPY3]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = G_FADD %0, %1
@@ -29,7 +30,8 @@ body: |
     ; CHECK-LABEL: name: fadd_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[FADD:%[0-9]+]]:vgpr(s32) = G_FADD [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FADD:%[0-9]+]]:vgpr(s32) = G_FADD [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_FADD %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fcanonicalize.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fcanonicalize.mir
index 2f8786740f19..7566004a62a7 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fcanonicalize.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fcanonicalize.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0_sgpr1
     ; CHECK-LABEL: name: fcanonicalize_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FCANONICALIZE:%[0-9]+]]:vgpr(s32) = G_FCANONICALIZE [[COPY]]
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FCANONICALIZE:%[0-9]+]]:vgpr(s32) = G_FCANONICALIZE [[COPY1]]
     ; CHECK: $vgpr0 = COPY [[FCANONICALIZE]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_FCANONICALIZE %0

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fceil.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fceil.mir
index 953db08a7039..04b632ed77fa 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fceil.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fceil.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: fceil_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FCEIL_:%[0-9]+]]:vgpr(s32) = G_FCEIL [[COPY]]
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FCEIL:%[0-9]+]]:vgpr(s32) = G_FCEIL [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_FCEIL %0
 ...
@@ -25,7 +26,7 @@ body: |
     liveins: $vgpr0
     ; CHECK-LABEL: name: fceil_v
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[FCEIL_:%[0-9]+]]:vgpr(s32) = G_FCEIL [[COPY]]
+    ; CHECK: [[FCEIL:%[0-9]+]]:vgpr(s32) = G_FCEIL [[COPY]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = G_FCEIL %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fexp2.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fexp2.mir
index 1b56ca3025c4..daec5789858a 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fexp2.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fexp2.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: fexp2_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FEXP2_:%[0-9]+]]:vgpr(s32) = G_FEXP2 [[COPY]]
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FEXP2_:%[0-9]+]]:vgpr(s32) = G_FEXP2 [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_FEXP2 %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-flog2.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-flog2.mir
index 2915137a929c..caf7c087b077 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-flog2.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-flog2.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: flog2_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FLOG2_:%[0-9]+]]:vgpr(s32) = G_FLOG2 [[COPY]]
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FLOG2_:%[0-9]+]]:vgpr(s32) = G_FLOG2 [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_FLOG2 %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fma.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fma.mir
index 3f0dc2237dca..9e076c5ac145 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fma.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fma.mir
@@ -13,9 +13,10 @@ body: |
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
     ; CHECK: [[COPY2:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
-    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
-    ; CHECK: [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY]], [[COPY3]], [[COPY4]]
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[COPY5:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
+    ; CHECK: [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY3]], [[COPY4]], [[COPY5]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = COPY $sgpr2
@@ -51,8 +52,9 @@ body: |
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY2:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
-    ; CHECK: [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY]], [[COPY1]], [[COPY3]]
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY2]](s32)
+    ; CHECK: [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY3]], [[COPY1]], [[COPY4]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = COPY $sgpr1
@@ -69,8 +71,9 @@ body: |
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
     ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY]], [[COPY3]], [[COPY2]]
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY4:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY3]], [[COPY4]], [[COPY2]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = COPY $vgpr0
@@ -123,7 +126,8 @@ body: |
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
-    ; CHECK: [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY]], [[COPY1]], [[COPY2]]
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FMA:%[0-9]+]]:vgpr(s32) = G_FMA [[COPY3]], [[COPY1]], [[COPY2]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = COPY $vgpr1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fmul.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fmul.mir
index a436d2d3eeb5..489aa7fc4699 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fmul.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fmul.mir
@@ -12,8 +12,9 @@ body: |
     ; CHECK-LABEL: name: fmul_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY2]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY2]], [[COPY3]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = G_FMUL %0, %1
@@ -29,7 +30,8 @@ body: |
     ; CHECK-LABEL: name: fmul_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FMUL:%[0-9]+]]:vgpr(s32) = G_FMUL [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_FMUL %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fpext.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fpext.mir
index 123b3558da64..cae636091c12 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fpext.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fpext.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: fpext_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FPEXT:%[0-9]+]]:vgpr(s64) = G_FPEXT [[COPY]](s32)
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FPEXT:%[0-9]+]]:vgpr(s64) = G_FPEXT [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s64) = G_FPEXT %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptosi.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptosi.mir
index 064b1819eb0e..eb014fd4ada2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptosi.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptosi.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: fptosi_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FPTOSI:%[0-9]+]]:vgpr(s32) = G_FPTOSI [[COPY]](s32)
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FPTOSI:%[0-9]+]]:vgpr(s32) = G_FPTOSI [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_FPTOSI %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptoui.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptoui.mir
index f7cf58a75fe8..86a9a9e8a512 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptoui.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptoui.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: fptoui_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FPTOUI:%[0-9]+]]:vgpr(s32) = G_FPTOUI [[COPY]](s32)
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FPTOUI:%[0-9]+]]:vgpr(s32) = G_FPTOUI [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_FPTOUI %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptrunc.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptrunc.mir
index 02cb49a4b8f4..f29a0d158f83 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptrunc.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fptrunc.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0_sgpr1
     ; CHECK-LABEL: name: fptrunc_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s64) = COPY $sgpr0_sgpr1
-    ; CHECK: [[FPTRUNC:%[0-9]+]]:vgpr(s32) = G_FPTRUNC [[COPY]](s64)
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY [[COPY]](s64)
+    ; CHECK: [[FPTRUNC:%[0-9]+]]:vgpr(s32) = G_FPTRUNC [[COPY1]](s64)
     %0:_(s64) = COPY $sgpr0_sgpr1
     %1:_(s32) = G_FPTRUNC %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-frint.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-frint.mir
index edf201967fe6..62cce6f39ba6 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-frint.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-frint.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: frint_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FRINT_:%[0-9]+]]:vgpr(s32) = G_FRINT [[COPY]]
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FRINT:%[0-9]+]]:vgpr(s32) = G_FRINT [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_FRINT %0
 ...
@@ -25,7 +26,7 @@ body: |
     liveins: $vgpr0
     ; CHECK-LABEL: name: frint_v
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[FRINT_:%[0-9]+]]:vgpr(s32) = G_FRINT [[COPY]]
+    ; CHECK: [[FRINT:%[0-9]+]]:vgpr(s32) = G_FRINT [[COPY]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = G_FRINT %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsqrt.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsqrt.mir
index a2e3a4320597..e9cc8c32086b 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsqrt.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsqrt.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0_sgpr1
     ; CHECK-LABEL: name: fsqrt_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[FSQRT:%[0-9]+]]:vgpr(s32) = G_FSQRT [[COPY]]
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FSQRT:%[0-9]+]]:vgpr(s32) = G_FSQRT [[COPY1]]
     ; CHECK: $vgpr0 = COPY [[FSQRT]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_FSQRT %0

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsub.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsub.mir
index a924e5ee4976..0cdfd80b66bd 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsub.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fsub.mir
@@ -12,8 +12,9 @@ body: |
     ; CHECK-LABEL: name: fsub_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[FSUB:%[0-9]+]]:vgpr(s32) = G_FSUB [[COPY]], [[COPY2]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; CHECK: [[FSUB:%[0-9]+]]:vgpr(s32) = G_FSUB [[COPY2]], [[COPY3]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = G_FSUB %0, %1
@@ -29,7 +30,8 @@ body: |
     ; CHECK-LABEL: name: fsub_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[FSUB:%[0-9]+]]:vgpr(s32) = G_FSUB [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[FSUB:%[0-9]+]]:vgpr(s32) = G_FSUB [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_FSUB %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-intrinsic-trunc.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-intrinsic-trunc.mir
index fef4d4921175..d9e6021ebfa5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-intrinsic-trunc.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-intrinsic-trunc.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: intrinsic_trunc_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[INTRINSIC_TRUNC:%[0-9]+]]:vgpr(s32) = G_INTRINSIC_TRUNC [[COPY]]
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[INTRINSIC_TRUNC:%[0-9]+]]:vgpr(s32) = G_INTRINSIC_TRUNC [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_INTRINSIC_TRUNC %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-lshr.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-lshr.mir
index a4e20fc5e51c..666b6145b3df 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-lshr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-lshr.mir
@@ -12,7 +12,7 @@ body: |
     ; CHECK-LABEL: name: lshr_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[LSHR:%[0-9]+]]:sgpr(s32) = G_LSHR [[COPY]], [[COPY1]]
+    ; CHECK: [[LSHR:%[0-9]+]]:sgpr(s32) = G_LSHR [[COPY]], [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = G_LSHR %0, %1
@@ -27,7 +27,8 @@ body: |
     ; CHECK-LABEL: name: lshr_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[LSHR:%[0-9]+]]:vgpr(s32) = G_LSHR [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[LSHR:%[0-9]+]]:vgpr(s32) = G_LSHR [[COPY2]], [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_LSHR %0, %1
@@ -43,7 +44,7 @@ body: |
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[LSHR:%[0-9]+]]:vgpr(s32) = G_LSHR [[COPY]], [[COPY2]]
+    ; CHECK: [[LSHR:%[0-9]+]]:vgpr(s32) = G_LSHR [[COPY]], [[COPY2]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s32) = G_LSHR %0, %1
@@ -58,7 +59,7 @@ body: |
     ; CHECK-LABEL: name: lshr_vv
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
-    ; CHECK: [[LSHR:%[0-9]+]]:vgpr(s32) = G_LSHR [[COPY]], [[COPY1]]
+    ; CHECK: [[LSHR:%[0-9]+]]:vgpr(s32) = G_LSHR [[COPY]], [[COPY1]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $vgpr1
     %2:_(s32) = G_LSHR %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mul.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mul.mir
index 16b89ba7d5ce..ef1d639dc57d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mul.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mul.mir
@@ -27,7 +27,8 @@ body: |
     ; CHECK-LABEL: name: mul_s32_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[MUL:%[0-9]+]]:vgpr(s32) = G_MUL [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[MUL:%[0-9]+]]:vgpr(s32) = G_MUL [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_MUL %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-or.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-or.mir
index fd9d888de935..acf0602a1cc0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-or.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-or.mir
@@ -28,7 +28,8 @@ body: |
     ; CHECK-LABEL: name: or_s32_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[OR:%[0-9]+]]:vgpr(s32) = G_OR [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[OR:%[0-9]+]]:vgpr(s32) = G_OR [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_OR %0, %1
@@ -706,7 +707,8 @@ body: |
     ; CHECK-LABEL: name: or_v2s16_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(<2 x s16>) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(<2 x s16>) = COPY $vgpr0
-    ; CHECK: [[OR:%[0-9]+]]:vgpr(<2 x s16>) = G_OR [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(<2 x s16>) = COPY [[COPY]](<2 x s16>)
+    ; CHECK: [[OR:%[0-9]+]]:vgpr(<2 x s16>) = G_OR [[COPY2]], [[COPY1]]
     %0:_(<2 x s16>) = COPY $sgpr0
     %1:_(<2 x s16>) = COPY $vgpr0
     %2:_(<2 x s16>) = G_OR %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-shl.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-shl.mir
index 1bcde4fb4a3f..7b1237bac8b5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-shl.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-shl.mir
@@ -12,7 +12,7 @@ body: |
     ; CHECK-LABEL: name: shl_ss
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; CHECK: [[SHL:%[0-9]+]]:sgpr(s32) = G_SHL [[COPY]], [[COPY1]]
+    ; CHECK: [[SHL:%[0-9]+]]:sgpr(s32) = G_SHL [[COPY]], [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = G_SHL %0, %1
@@ -28,7 +28,8 @@ body: |
     ; CHECK-LABEL: name: shl_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[SHL:%[0-9]+]]:vgpr(s32) = G_SHL [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[SHL:%[0-9]+]]:vgpr(s32) = G_SHL [[COPY2]], [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_SHL %0, %1
@@ -45,7 +46,7 @@ body: |
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; CHECK: [[SHL:%[0-9]+]]:vgpr(s32) = G_SHL [[COPY]], [[COPY2]]
+    ; CHECK: [[SHL:%[0-9]+]]:vgpr(s32) = G_SHL [[COPY]], [[COPY2]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s32) = G_SHL %0, %1
@@ -61,7 +62,7 @@ body: |
     ; CHECK-LABEL: name: shl_vv
     ; CHECK: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
-    ; CHECK: [[SHL:%[0-9]+]]:vgpr(s32) = G_SHL [[COPY]], [[COPY1]]
+    ; CHECK: [[SHL:%[0-9]+]]:vgpr(s32) = G_SHL [[COPY]], [[COPY1]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $vgpr1
     %2:_(s32) = G_SHL %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sitofp.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sitofp.mir
index e48292127d54..b32e17244a3f 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sitofp.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sitofp.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: sitofp_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[SITOFP:%[0-9]+]]:vgpr(s32) = G_SITOFP [[COPY]](s32)
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[SITOFP:%[0-9]+]]:vgpr(s32) = G_SITOFP [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_SITOFP %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smax.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smax.mir
index 77fcaa403b70..ead940d9e88b 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smax.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smax.mir
@@ -36,11 +36,13 @@ body: |
     ; FAST-LABEL: name: smax_s32_sv
     ; FAST: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; FAST: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; FAST: [[SMAX:%[0-9]+]]:vgpr(s32) = G_SMAX [[COPY]], [[COPY1]]
+    ; FAST: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; FAST: [[SMAX:%[0-9]+]]:vgpr(s32) = G_SMAX [[COPY2]], [[COPY1]]
     ; GREEDY-LABEL: name: smax_s32_sv
     ; GREEDY: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; GREEDY: [[SMAX:%[0-9]+]]:vgpr(s32) = G_SMAX [[COPY]], [[COPY1]]
+    ; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GREEDY: [[SMAX:%[0-9]+]]:vgpr(s32) = G_SMAX [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_SMAX %0, %1
@@ -62,7 +64,8 @@ body: |
     ; GREEDY-LABEL: name: smax_s32_vs
     ; GREEDY: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; GREEDY: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; GREEDY: [[SMAX:%[0-9]+]]:vgpr(s32) = G_SMAX [[COPY]], [[COPY1]]
+    ; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; GREEDY: [[SMAX:%[0-9]+]]:vgpr(s32) = G_SMAX [[COPY]], [[COPY2]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s32) = G_SMAX %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smin.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smin.mir
index 7869c9bc5130..ad76fedfd6ab 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smin.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smin.mir
@@ -36,11 +36,13 @@ body: |
     ; FAST-LABEL: name: smin_s32_sv
     ; FAST: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; FAST: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; FAST: [[SMIN:%[0-9]+]]:vgpr(s32) = G_SMIN [[COPY]], [[COPY1]]
+    ; FAST: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; FAST: [[SMIN:%[0-9]+]]:vgpr(s32) = G_SMIN [[COPY2]], [[COPY1]]
     ; GREEDY-LABEL: name: smin_s32_sv
     ; GREEDY: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; GREEDY: [[SMIN:%[0-9]+]]:vgpr(s32) = G_SMIN [[COPY]], [[COPY1]]
+    ; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GREEDY: [[SMIN:%[0-9]+]]:vgpr(s32) = G_SMIN [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_SMIN %0, %1
@@ -62,7 +64,8 @@ body: |
     ; GREEDY-LABEL: name: smin_s32_vs
     ; GREEDY: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; GREEDY: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; GREEDY: [[SMIN:%[0-9]+]]:vgpr(s32) = G_SMIN [[COPY]], [[COPY1]]
+    ; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; GREEDY: [[SMIN:%[0-9]+]]:vgpr(s32) = G_SMIN [[COPY]], [[COPY2]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s32) = G_SMIN %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smulh.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smulh.mir
index 4401a3c8bd9b..d1231bb996b4 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smulh.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smulh.mir
@@ -16,8 +16,9 @@ body: |
     ; GFX6-LABEL: name: smulh_s32_ss
     ; GFX6: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GFX6: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; GFX6: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; GFX6: [[SMULH:%[0-9]+]]:vgpr(s32) = G_SMULH [[COPY]], [[COPY2]]
+    ; GFX6: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GFX6: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; GFX6: [[SMULH:%[0-9]+]]:vgpr(s32) = G_SMULH [[COPY2]], [[COPY3]]
     ; GFX9-LABEL: name: smulh_s32_ss
     ; GFX9: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GFX9: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
@@ -38,11 +39,13 @@ body: |
     ; GFX6-LABEL: name: smulh_s32_sv
     ; GFX6: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GFX6: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; GFX6: [[SMULH:%[0-9]+]]:vgpr(s32) = G_SMULH [[COPY]], [[COPY1]]
+    ; GFX6: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GFX6: [[SMULH:%[0-9]+]]:vgpr(s32) = G_SMULH [[COPY2]], [[COPY1]]
     ; GFX9-LABEL: name: smulh_s32_sv
     ; GFX9: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GFX9: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; GFX9: [[SMULH:%[0-9]+]]:vgpr(s32) = G_SMULH [[COPY]], [[COPY1]]
+    ; GFX9: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GFX9: [[SMULH:%[0-9]+]]:vgpr(s32) = G_SMULH [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_SMULH %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sub.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sub.mir
index 49562856af76..20da45894379 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sub.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sub.mir
@@ -27,7 +27,8 @@ body: |
     ; CHECK-LABEL: name: sub_s32_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[SUB:%[0-9]+]]:vgpr(s32) = G_SUB [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[SUB:%[0-9]+]]:vgpr(s32) = G_SUB [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_SUB %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-uitofp.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-uitofp.mir
index 07f47c559a77..455e333a0eb2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-uitofp.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-uitofp.mir
@@ -11,7 +11,8 @@ body: |
     liveins: $sgpr0
     ; CHECK-LABEL: name: uitofp_s
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK: [[UITOFP:%[0-9]+]]:vgpr(s32) = G_UITOFP [[COPY]](s32)
+    ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[UITOFP:%[0-9]+]]:vgpr(s32) = G_UITOFP [[COPY1]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_UITOFP %0
 ...

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umax.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umax.mir
index e88eb912e8ac..25ab908a3f53 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umax.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umax.mir
@@ -36,11 +36,13 @@ body: |
     ; FAST-LABEL: name: umax_s32_sv
     ; FAST: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; FAST: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; FAST: [[UMAX:%[0-9]+]]:vgpr(s32) = G_UMAX [[COPY]], [[COPY1]]
+    ; FAST: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; FAST: [[UMAX:%[0-9]+]]:vgpr(s32) = G_UMAX [[COPY2]], [[COPY1]]
     ; GREEDY-LABEL: name: umax_s32_sv
     ; GREEDY: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; GREEDY: [[UMAX:%[0-9]+]]:vgpr(s32) = G_UMAX [[COPY]], [[COPY1]]
+    ; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GREEDY: [[UMAX:%[0-9]+]]:vgpr(s32) = G_UMAX [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_UMAX %0, %1
@@ -62,7 +64,8 @@ body: |
     ; GREEDY-LABEL: name: umax_s32_vs
     ; GREEDY: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; GREEDY: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; GREEDY: [[UMAX:%[0-9]+]]:vgpr(s32) = G_UMAX [[COPY]], [[COPY1]]
+    ; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; GREEDY: [[UMAX:%[0-9]+]]:vgpr(s32) = G_UMAX [[COPY]], [[COPY2]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s32) = G_UMAX %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umin.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umin.mir
index de3f1887fbf1..01efcd6f1ea3 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umin.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umin.mir
@@ -36,11 +36,13 @@ body: |
     ; FAST-LABEL: name: umin_s32_sv
     ; FAST: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; FAST: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; FAST: [[UMIN:%[0-9]+]]:vgpr(s32) = G_UMIN [[COPY]], [[COPY1]]
+    ; FAST: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; FAST: [[UMIN:%[0-9]+]]:vgpr(s32) = G_UMIN [[COPY2]], [[COPY1]]
     ; GREEDY-LABEL: name: umin_s32_sv
     ; GREEDY: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; GREEDY: [[UMIN:%[0-9]+]]:vgpr(s32) = G_UMIN [[COPY]], [[COPY1]]
+    ; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GREEDY: [[UMIN:%[0-9]+]]:vgpr(s32) = G_UMIN [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_UMIN %0, %1
@@ -62,7 +64,8 @@ body: |
     ; GREEDY-LABEL: name: umin_s32_vs
     ; GREEDY: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; GREEDY: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; GREEDY: [[UMIN:%[0-9]+]]:vgpr(s32) = G_UMIN [[COPY]], [[COPY1]]
+    ; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; GREEDY: [[UMIN:%[0-9]+]]:vgpr(s32) = G_UMIN [[COPY]], [[COPY2]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $sgpr0
     %2:_(s32) = G_UMIN %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umulh.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umulh.mir
index f62cd4cd4695..2c3fcd28414a 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umulh.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umulh.mir
@@ -16,8 +16,9 @@ body: |
     ; GFX6-LABEL: name: umulh_s32_ss
     ; GFX6: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GFX6: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
-    ; GFX6: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
-    ; GFX6: [[UMULH:%[0-9]+]]:vgpr(s32) = G_UMULH [[COPY]], [[COPY2]]
+    ; GFX6: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GFX6: [[COPY3:%[0-9]+]]:vgpr(s32) = COPY [[COPY1]](s32)
+    ; GFX6: [[UMULH:%[0-9]+]]:vgpr(s32) = G_UMULH [[COPY2]], [[COPY3]]
     ; GFX9-LABEL: name: umulh_s32_ss
     ; GFX9: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GFX9: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
@@ -38,11 +39,13 @@ body: |
     ; GFX6-LABEL: name: umulh_s32_sv
     ; GFX6: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GFX6: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; GFX6: [[UMULH:%[0-9]+]]:vgpr(s32) = G_UMULH [[COPY]], [[COPY1]]
+    ; GFX6: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GFX6: [[UMULH:%[0-9]+]]:vgpr(s32) = G_UMULH [[COPY2]], [[COPY1]]
     ; GFX9-LABEL: name: umulh_s32_sv
     ; GFX9: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; GFX9: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; GFX9: [[UMULH:%[0-9]+]]:vgpr(s32) = G_UMULH [[COPY]], [[COPY1]]
+    ; GFX9: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; GFX9: [[UMULH:%[0-9]+]]:vgpr(s32) = G_UMULH [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_UMULH %0, %1

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-xor.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-xor.mir
index d42af7019a64..a958511118f4 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-xor.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-xor.mir
@@ -28,7 +28,8 @@ body: |
     ; CHECK-LABEL: name: xor_s32_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK: [[XOR:%[0-9]+]]:vgpr(s32) = G_XOR [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
+    ; CHECK: [[XOR:%[0-9]+]]:vgpr(s32) = G_XOR [[COPY2]], [[COPY1]]
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = G_XOR %0, %1
@@ -706,7 +707,8 @@ body: |
     ; CHECK-LABEL: name: xor_v2s16_sv
     ; CHECK: [[COPY:%[0-9]+]]:sgpr(<2 x s16>) = COPY $sgpr0
     ; CHECK: [[COPY1:%[0-9]+]]:vgpr(<2 x s16>) = COPY $vgpr0
-    ; CHECK: [[XOR:%[0-9]+]]:vgpr(<2 x s16>) = G_XOR [[COPY]], [[COPY1]]
+    ; CHECK: [[COPY2:%[0-9]+]]:vgpr(<2 x s16>) = COPY [[COPY]](<2 x s16>)
+    ; CHECK: [[XOR:%[0-9]+]]:vgpr(<2 x s16>) = G_XOR [[COPY2]], [[COPY1]]
     %0:_(<2 x s16>) = COPY $sgpr0
     %1:_(<2 x s16>) = COPY $vgpr0
     %2:_(<2 x s16>) = G_XOR %0, %1


        


More information about the llvm-commits mailing list