[llvm] Address Codegen bug related to marking subregister MachineOperand defines as undef (PR #134929)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 8 13:46:35 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: Ryan Buchner (bababuck)
<details>
<summary>Changes</summary>
Fixes #<!-- -->130884.
In the failing case, the following COPIES are moved such that they swap in order:
```
96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
112B undef %18.sub1:vreg_64 = COPY killed $vgpr4
128B %18.sub0:vreg_64 = COPY killed $vgpr5
```
However, the `undef` flag is improperly left of the 112B instrcution, resulting in:
```
96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
128B %18.sub0:vreg_64 = COPY killed $vgpr5
112B undef %18.sub1:vreg_64 = COPY killed $vgpr4
```
This causes the liveness verifier to flag this as an error.
The proper result code should be:
```
96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
128B undef %18.sub0:vreg_64 = COPY killed $vgpr5
112B %18.sub1:vreg_64 = COPY killed $vgpr4
```
Notes:
- An alternative solution would be to prevent the re-ordering in this situation.
- I made the change inside `reschedulePhysReg()`, but we could instead move the change up the call tree to `moveInstruction()`, thoughts? I avoided doing that to limit the scope of this change.
- Passes lit and unit tests.
@<!-- -->arsenm
---
Full diff: https://github.com/llvm/llvm-project/pull/134929.diff
2 Files Affected:
- (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+25)
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll (+7-8)
``````````diff
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 5086ee8829b25..4999e5af7f1e4 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -3979,6 +3979,31 @@ void GenericScheduler::reschedulePhysReg(SUnit *SU, bool isTop) {
continue;
LLVM_DEBUG(dbgs() << " Rescheduling physreg copy ";
DAG->dumpNode(*Dep.getSUnit()));
+
+ // Check to make sure that there are no subreg defintions of the given
+ // register between it's new and old location that are marked as undef. If
+ // so, mark the current instruction as undef instead.
+ SmallVector<MachineOperand*, 1> SubregDefs;
+ for (MachineOperand& MO : Copy->operands()) {
+ if (MO.isReg() && MO.isDef() && MO.getSubReg() != 0) {
+ SubregDefs.push_back(&MO);
+ }
+ }
+ if (SubregDefs.size()) {
+ for (auto CurrInst = InsertPos; CurrInst != Copy; ++CurrInst) {
+ for (MachineOperand& MO : CurrInst->operands()) {
+ if (MO.isReg() && MO.isDef() && MO.isUndef() && MO.getSubReg() != 0) {
+ for (auto *MISubregDef : SubregDefs) {
+ if (MISubregDef->getReg() == MO.getReg()) {
+ assert(!MISubregDef->isUndef() && "Register defined as undef twice.");
+ MO.setIsUndef(false);
+ MISubregDef->setIsUndef(true);
+ }
+ }
+ }
+ }
+ }
+ }
DAG->moveInstruction(Copy, InsertPos);
}
}
diff --git a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
index 936118750ff10..71e0e5afa81a0 100644
--- a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
+++ b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
@@ -1,8 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; FIXME: Fails expensive checks, should re-enable verifier, see issue #130884
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
; Test that we can form v_pk_mov_b32 in certain shuffles when they
; originate from 32-bit physreg copy sequences.
@@ -735,10 +734,10 @@ define i32 @shufflevector_v4i32_3210_physreg_even_vgpr_quad_copy_other_use_elt(p
; GFX900-NEXT: ;;#ASMSTART
; GFX900-NEXT: ; def v4, v5, v6, v7
; GFX900-NEXT: ;;#ASMEND
-; GFX900-NEXT: v_mov_b32_e32 v9, v5
-; GFX900-NEXT: v_mov_b32_e32 v8, v6
-; GFX900-NEXT: v_mov_b32_e32 v10, v4
-; GFX900-NEXT: global_store_dwordx4 v0, v[7:10], s[16:17]
+; GFX900-NEXT: v_mov_b32_e32 v3, v5
+; GFX900-NEXT: v_mov_b32_e32 v2, v6
+; GFX900-NEXT: v_mov_b32_e32 v1, v7
+; GFX900-NEXT: global_store_dwordx4 v0, v[1:4], s[16:17]
; GFX900-NEXT: v_mov_b32_e32 v0, v6
; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]
``````````
</details>
https://github.com/llvm/llvm-project/pull/134929
More information about the llvm-commits
mailing list