[llvm] [AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. (PR #69957)

Mon Oct 23 11:18:56 PDT 2023

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Valery Pykhtin (vpykhtin)

<details>
<summary>Changes</summary>

For the following testcase:

undef %1.sub1:**sgpr_96** = COPY undef %0:sgpr_32
%3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, 2, undef %2:vgpr_32, implicit $exec

GCNRewritePartialRegUses produced:

%4:**vgpr_32** = COPY undef %1:sgpr_32
dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, 2, undef %3:vgpr_32, implicit $exec

Register class for %4 is incorrect: there should be _sgpr_32_ instead of _vgpr_32_ because the original %1 had scalar regclass. This happens because %4 participate only in two instructions from which:
-    def in COPY has no reglass information from the instruction description.
-    use in V_LSHL_ADD_U32_e64 has _VS_32_ class from the instruction description.

Thus GCNRewritePartialRegUses used only  _VS_32_ class and selected vector subclass as the largest - what it missed here is that it should take into account the regclass for _%1.sub1:sgpr_96_. This patch fixes that, debug output after the fix:

```
Try to rewrite partial reg %0:SGPR_96
  sub1:SGPR_32 ; <- regclass for %1.sub1:sgpr_96
  sub1:SGPR_32 & VS_32 = SGPR_32 ; <- common class for %1.sub1:sgpr_96 and VS_32 from V_LSHL_ADD_U32_e64 opnd
  Shift 32, reg align 32
  sub1:SGPR_32 -> whole reg, num regclasses 1
  Success %0:SGPR_96 -> %4:SGPR_32
```

And here starts another story. All of this deduction of final regclass using regclasses from instruction operands description is really not needed because _TRI->getSubRegisterClass_ for _%1.sub1:sgpr_96_ is already returning the right class - _SGPR_32_ as we see in the debug output above but that haven't been always like that. I did a bisect search and found that _TRI->getSubRegisterClass_ behavior has been fixed by the https://github.com/llvm/llvm-project/pull/67245, thanks to @kosarev. Before his fix getSubRegisterClass returned strange classes like:

`SGPR_128:sub0_sub1 -> SReg_1_with_sub0_and_SReg_1_with_lo16_in_SGPR_LO16`

which isn't even allocatable.

I'm going to cleanup the implementation but I would like to do this in a separate commit. I've tested my patch before the https://github.com/llvm/llvm-project/pull/67245 and it also works.

Note that GCNRewritePartialRegUses pass isn't enabled by default yet.

---

Patch is 321.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69957.diff


4 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp (+30-29) 
- (modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir (+15-16) 
- (modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir (+1806-1806) 
- (modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses.mir (+35-25) 


``````````diff

diff --git a/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp b/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
index 99db7e4af9fd1c9..c94e5c16fb2cf2e 100644
--- a/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
@@ -101,17 +101,16 @@ class GCNRewritePartialRegUses : public MachineFunctionPass {
   /// find new regclass such that:
   ///   1. It has subregs obtained by shifting each OldSubReg by RShift number
   ///      of bits to the right. Every "shifted" subreg should have the same
-  ///      SubRegRC. SubRegRC can be null, in this case it initialized using
-  ///      getSubRegisterClass. If CoverSubregIdx is not zero it's a subreg that
-  ///      "covers" all other subregs in pairs. Basically such subreg becomes a
-  ///      whole register.
+  ///      SubRegRC. If CoverSubregIdx is not zero it's a subreg that "covers"
+  ///      all other subregs in pairs. Basically such subreg becomes a whole
+  ///      register.
   ///   2. Resulting register class contains registers of minimal size but not
   ///      less than RegNumBits.
   ///
   /// SubRegs is map of OldSubReg -> [SubRegRC, NewSubReg] and is used as in/out
   /// parameter:
   ///   OldSubReg - input parameter,
-  ///   SubRegRC  - in/out, should be changed for unknown regclass,
+  ///   SubRegRC  - input parameter (cannot be null),
   ///   NewSubReg - output, contains shifted subregs on return.
   const TargetRegisterClass *
   getRegClassWithShiftedSubregs(const TargetRegisterClass *RC, unsigned RShift,
@@ -228,19 +227,7 @@ GCNRewritePartialRegUses::getRegClassWithShiftedSubregs(
   BitVector ClassMask(getAllocatableAndAlignedRegClassMask(RCAlign));
   for (auto &[OldSubReg, SRI] : SubRegs) {
     auto &[SubRegRC, NewSubReg] = SRI;
-
-    // Register class may be unknown, for example:
-    //   undef %0.sub4:sgpr_1024 = S_MOV_B32 01
-    //   %0.sub5:sgpr_1024 = S_MOV_B32 02
-    //   %1:vreg_64 = COPY %0.sub4_sub5
-    // Register classes for subregs 'sub4' and 'sub5' are known from the
-    // description of destination operand of S_MOV_B32 instruction but the
-    // class for the subreg 'sub4_sub5' isn't specified by the COPY instruction.
-    if (!SubRegRC)
-      SubRegRC = TRI->getSubRegisterClass(RC, OldSubReg);
-
-    if (!SubRegRC)
-      return nullptr;
+    assert(SubRegRC);
 
     LLVM_DEBUG(dbgs() << "  " << TRI->getSubRegIndexName(OldSubReg) << ':'
                       << TRI->getRegClassName(SubRegRC)
@@ -248,6 +235,8 @@ GCNRewritePartialRegUses::getRegClassWithShiftedSubregs(
                       << " -> ");
 
     if (OldSubReg == CoverSubregIdx) {
+      // Covering subreg will become a full register, RC should be allocatable.
+      assert(SubRegRC->isAllocatable());
       NewSubReg = AMDGPU::NoSubRegister;
       LLVM_DEBUG(dbgs() << "whole reg");
     } else {
@@ -425,7 +414,7 @@ bool GCNRewritePartialRegUses::rewriteReg(Register Reg) const {
     return false;
 
   for (MachineOperand &MO : Range) {
-    if (MO.getSubReg() == AMDGPU::NoSubRegister) // Whole reg used, quit.
+    if (MO.getSubReg() == AMDGPU::NoSubRegister) // Whole reg used, quit. [1]
       return false;
   }
 
@@ -433,21 +422,33 @@ bool GCNRewritePartialRegUses::rewriteReg(Register Reg) const {
   LLVM_DEBUG(dbgs() << "Try to rewrite partial reg " << printReg(Reg, TRI)
                     << ':' << TRI->getRegClassName(RC) << '\n');
 
-  // Collect used subregs and constrained reg classes infered from instruction
+  // Collect used subregs and their reg classes infered from instruction
   // operands.
   SubRegMap SubRegs;
   for (MachineOperand &MO : MRI->reg_nodbg_operands(Reg)) {
-    assert(MO.getSubReg() != AMDGPU::NoSubRegister);
-    auto *OpDescRC = getOperandRegClass(MO);
-    const auto [I, Inserted] = SubRegs.try_emplace(MO.getSubReg(), OpDescRC);
-    if (!Inserted && OpDescRC) {
-      SubRegInfo &SRI = I->second;
-      SRI.RC = SRI.RC ? TRI->getCommonSubClass(SRI.RC, OpDescRC) : OpDescRC;
-      if (!SRI.RC) {
-        LLVM_DEBUG(dbgs() << "  Couldn't find common target regclass\n");
-        return false;
+    const unsigned SubReg = MO.getSubReg();
+    assert(SubReg != AMDGPU::NoSubRegister); // Due to [1].
+    LLVM_DEBUG(dbgs() << "  " << TRI->getSubRegIndexName(SubReg) << ':');
+
+    const auto [I, Inserted] = SubRegs.try_emplace(SubReg);
+    const TargetRegisterClass *&SubRegRC = I->second.RC;
+
+    if (Inserted)
+      SubRegRC = TRI->getSubRegisterClass(RC, SubReg);
+
+    if (SubRegRC) {
+      if (const TargetRegisterClass *OpDescRC = getOperandRegClass(MO)) {
+        LLVM_DEBUG(dbgs() << TRI->getRegClassName(SubRegRC) << " & "
+                          << TRI->getRegClassName(OpDescRC) << " = ");
+        SubRegRC = TRI->getCommonSubClass(SubRegRC, OpDescRC);
       }
     }
+
+    if (!SubRegRC) {
+      LLVM_DEBUG(dbgs() << "couldn't find target regclass\n");
+      return false;
+    } else
+      LLVM_DEBUG(dbgs() << TRI->getRegClassName(SubRegRC) << '\n');
   }
 
   auto *NewRC = getMinSizeReg(RC, SubRegs);
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
index 4e1f912a9d6f98e..85d0c054754d03d 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
@@ -7,7 +7,6 @@
     unreachable, !dbg !11
   }
 
-  ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
   declare void @llvm.dbg.value(metadata, metadata, metadata) #0
 
   attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
@@ -36,21 +35,21 @@ name:            test_vreg_96_w64
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_96_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
-    ; CHECK-NEXT: DBG_VALUE %3.sub0, $noreg, !9, !DIExpression(), debug-location !11
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %3.sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 2, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %3, debug-location !DILocation(line: 3, column: 1, scope: !5)
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %4.sub0, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %4.sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %4, debug-location !DILocation(line: 6, column: 1, scope: !5)
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %5, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %5, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %5, debug-location !DILocation(line: 6, column: 1, scope: !5)
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_]].sub0, $noreg, !9, !DIExpression(), debug-location !11
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_]].sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 2, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]], debug-location !DILocation(line: 3, column: 1, scope: !5)
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_1]].sub0, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_1]].sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]], debug-location !DILocation(line: 6, column: 1, scope: !5)
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_2]], $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_2]], $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]], debug-location !DILocation(line: 6, column: 1, scope: !5)
     undef %0.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
     DBG_VALUE %0.sub0, $noreg, !9, !DIExpression(), debug-location !11
     %0.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
index d51e63f92e69141..037f39df8c3e06e 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
@@ -6,26 +6,26 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_subregs_composition_vreg_1024
-    ; CHECK: undef %5.sub0:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5.sub0_sub1
-    ; CHECK-NEXT: S_NOP 0, implicit %5.sub1_sub2
-    ; CHECK-NEXT: undef %6.sub0:vreg_128 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %6.sub1:vreg_128 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %6.sub0_sub1_sub2
-    ; CHECK-NEXT: S_NOP 0, implicit %6.sub1_sub2_sub3
-    ; CHECK-NEXT: undef %7.sub0:vreg_160 = V_MOV_B32_e32 21, implicit $exec
-    ; CHECK-NEXT: %7.sub1:vreg_160 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %7.sub0_sub1_sub2_sub3
-    ; CHECK-NEXT: S_NOP 0, implicit %7.sub1_sub2_sub3_sub4
-    ; CHECK-NEXT: undef %8.sub0:vreg_192 = V_MOV_B32_e32 31, implicit $exec
-    ; CHECK-NEXT: %8.sub1:vreg_192 = V_MOV_B32_e32 32, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %8.sub0_sub1_sub2_sub3_sub4
-    ; CHECK-NEXT: S_NOP 0, implicit %8.sub1_sub2_sub3_sub4_sub5
-    ; CHECK-NEXT: undef %9.sub0:vreg_256 = V_MOV_B32_e32 41, implicit $exec
-    ; CHECK-NEXT: %9.sub2:vreg_256 = V_MOV_B32_e32 43, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %9.sub0_sub1_sub2_sub3_sub4_sub5
-    ; CHECK-NEXT: S_NOP 0, implicit %9.sub2_sub3_sub4_sub5_sub6_sub7
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]].sub0_sub1
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]].sub1_sub2
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_128 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_128 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]].sub0_sub1_sub2
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]].sub1_sub2_sub3
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_160 = V_MOV_B32_e32 21, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_160 = V_MOV_B32_e32 22, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]].sub0_sub1_sub2_sub3
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]].sub1_sub2_sub3_sub4
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_3:%[0-9]+]].sub0:vreg_192 = V_MOV_B32_e32 31, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_3:%[0-9]+]].sub1:vreg_192 = V_MOV_B32_e32 32, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_3]].sub0_sub1_sub2_sub3_sub4
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_3]].sub1_sub2_sub3_sub4_sub5
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_4:%[0-9]+]].sub0:vreg_256 = V_MOV_B32_e32 41, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_4:%[0-9]+]].sub2:vreg_256 = V_MOV_B32_e32 43, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_4]].sub0_sub1_sub2_sub3_sub4_sub5
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_4]].sub2_sub3_sub4_sub5_sub6_sub7
     undef %0.sub1:vreg_1024 = V_MOV_B32_e32 01, implicit $exec
     %0.sub2:vreg_1024 = V_MOV_B32_e32 02, implicit $exec
     S_NOP 0, implicit %0.sub1_sub2
@@ -97,12 +97,12 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_96_w64
-    ; CHECK: undef %2.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %2.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %2
-    ; CHECK-NEXT: undef %3.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
     undef %0.sub0:vreg_96 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_96 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -140,15 +140,15 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_128_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 22, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 23, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]]
     undef %0.sub0:vreg_128 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_128 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -168,14 +168,14 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_128_w96
-    ; CHECK: undef %2.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %2.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %2.sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %2
-    ; CHECK-NEXT: undef %3.sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: %3.sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
     undef %0.sub0:vreg_128 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_128 = V_MOV_B32_e32 01, implicit $exec
     %0.sub2:vreg_128 = V_MOV_B32_e32 02, implicit $exec
@@ -215,15 +215,15 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_160_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 24, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 23, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 24, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]]
     undef %0.sub0:vreg_160 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_160 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -243,18 +243,18 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_160_w96
-    ; CHECK: undef %3.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %3.sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: %4.sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_96 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_96 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: %5.sub2:vreg_96 = V_MOV_B32_e32 24, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEX...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/69957