[llvm] Address Codegen bug related to marking subregister MachineOperand defines as undef (PR #134929)

Tue Apr 8 13:46:35 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Ryan Buchner (bababuck)

<details>
<summary>Changes</summary>

Fixes #130884.

In the failing case, the following COPIES are moved such that they swap in order:
```
  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  112B undef %18.sub1:vreg_64 = COPY killed $vgpr4
  128B %18.sub0:vreg_64 = COPY killed $vgpr5
```
However, the `undef` flag is improperly left of the 112B instrcution, resulting in:
```
  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  128B %18.sub0:vreg_64 = COPY killed $vgpr5
  112B undef %18.sub1:vreg_64 = COPY killed $vgpr4
```
This causes the liveness verifier to flag this as an error.

The proper result code should be:
```
  96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
  128B undef %18.sub0:vreg_64 = COPY killed $vgpr5
  112B %18.sub1:vreg_64 = COPY killed $vgpr4
```
 
Notes:
- An alternative solution would be to prevent the re-ordering in this situation.

- I made the change inside `reschedulePhysReg()`, but we could instead move the change up the call tree to `moveInstruction()`, thoughts? I avoided doing that to limit the scope of this change.

- Passes lit and unit tests.

@arsenm 

---
Full diff: https://github.com/llvm/llvm-project/pull/134929.diff


2 Files Affected:

- (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+25) 
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll (+7-8) 


``````````diff

diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 5086ee8829b25..4999e5af7f1e4 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -3979,6 +3979,31 @@ void GenericScheduler::reschedulePhysReg(SUnit *SU, bool isTop) {
       continue;
     LLVM_DEBUG(dbgs() << "  Rescheduling physreg copy ";
                DAG->dumpNode(*Dep.getSUnit()));
+
+    // Check to make sure that there are no subreg defintions of the given
+    // register between it's new and old location that are marked as undef. If
+    // so, mark the current instruction as undef instead.
+    SmallVector<MachineOperand*, 1> SubregDefs;
+    for (MachineOperand& MO : Copy->operands()) {
+      if (MO.isReg() && MO.isDef() && MO.getSubReg() != 0) {
+        SubregDefs.push_back(&MO);
+      }
+    }
+    if (SubregDefs.size()) {
+      for (auto CurrInst = InsertPos; CurrInst != Copy; ++CurrInst) {
+        for (MachineOperand& MO : CurrInst->operands()) {
+          if (MO.isReg() && MO.isDef() && MO.isUndef() && MO.getSubReg() != 0) {
+            for (auto *MISubregDef : SubregDefs) {
+              if (MISubregDef->getReg() == MO.getReg()) {
+                assert(!MISubregDef->isUndef() && "Register defined as undef twice.");
+                MO.setIsUndef(false);
+                MISubregDef->setIsUndef(true);
+              }
+            }
+          }
+        }
+      }
+    }
     DAG->moveInstruction(Copy, InsertPos);
   }
 }
diff --git a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
index 936118750ff10..71e0e5afa81a0 100644
--- a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
+++ b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
@@ -1,8 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; FIXME: Fails expensive checks, should re-enable verifier, see issue #130884
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
 
 ; Test that we can form v_pk_mov_b32 in certain shuffles when they
 ; originate from 32-bit physreg copy sequences.
@@ -735,10 +734,10 @@ define i32 @shufflevector_v4i32_3210_physreg_even_vgpr_quad_copy_other_use_elt(p
 ; GFX900-NEXT:    ;;#ASMSTART
 ; GFX900-NEXT:    ; def v4, v5, v6, v7
 ; GFX900-NEXT:    ;;#ASMEND
-; GFX900-NEXT:    v_mov_b32_e32 v9, v5
-; GFX900-NEXT:    v_mov_b32_e32 v8, v6
-; GFX900-NEXT:    v_mov_b32_e32 v10, v4
-; GFX900-NEXT:    global_store_dwordx4 v0, v[7:10], s[16:17]
+; GFX900-NEXT:    v_mov_b32_e32 v3, v5
+; GFX900-NEXT:    v_mov_b32_e32 v2, v6
+; GFX900-NEXT:    v_mov_b32_e32 v1, v7
+; GFX900-NEXT:    global_store_dwordx4 v0, v[1:4], s[16:17]
 ; GFX900-NEXT:    v_mov_b32_e32 v0, v6
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]

``````````

</details>


https://github.com/llvm/llvm-project/pull/134929