[llvm] Address Codegen bug related to marking subregister MachineOperand defines as undef (PR #134929)
Ryan Buchner via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 8 13:46:01 PDT 2025
https://github.com/bababuck created https://github.com/llvm/llvm-project/pull/134929
Fixes #130884.
In the failing case, the following COPIES are moved such that they swap in order:
```
96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
112B undef %18.sub1:vreg_64 = COPY killed $vgpr4
128B %18.sub0:vreg_64 = COPY killed $vgpr5
```
However, the `undef` flag is improperly left of the 112B instrcution, resulting in:
```
96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
128B %18.sub0:vreg_64 = COPY killed $vgpr5
112B undef %18.sub1:vreg_64 = COPY killed $vgpr4
```
This causes the liveness verifier to flag this as an error.
The proper result code should be:
```
96B INLINEASM &"; def $0, $1" [attdialect], $0:[regdef], implicit-def $vgpr4, $1:[regdef], implicit-def $vgpr5
128B undef %18.sub0:vreg_64 = COPY killed $vgpr5
112B %18.sub1:vreg_64 = COPY killed $vgpr4
```
Notes:
- An alternative solution would be to prevent the re-ordering in this situation.
- I made the change inside `reschedulePhysReg()`, but we could instead move the change up the call tree to `moveInstruction()`, thoughts? I avoided doing that to limit the scope of this change.
- Passes lit and unit tests.
@arsenm
>From 5d38944f6dae8971ffe2dac5531d883f079f95b1 Mon Sep 17 00:00:00 2001
From: bababuck <buchner.ryan at gmail.com>
Date: Tue, 8 Apr 2025 13:32:15 -0700
Subject: [PATCH 1/2] Revert "AMDGPU: Disable machine verifier in failing test"
This reverts commit bf2d1c46072a0461cb3ddcaefcafeccc2637995e.
---
llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
index 936118750ff10..e130ce0a45467 100644
--- a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
+++ b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
@@ -1,8 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; FIXME: Fails expensive checks, should re-enable verifier, see issue #130884
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -verify-machineinstrs=0 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GFX9,GFX900 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX90A %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 < %s | FileCheck -check-prefixes=GFX9,GFX90APLUS,GFX940 %s
; Test that we can form v_pk_mov_b32 in certain shuffles when they
; originate from 32-bit physreg copy sequences.
>From 7dbb3b0cab0d34db61f5e4db0587e190564ae5d3 Mon Sep 17 00:00:00 2001
From: bababuck <buchner.ryan at gmail.com>
Date: Tue, 8 Apr 2025 12:10:45 -0700
Subject: [PATCH 2/2] Properly handle undef statements in
ScheduleDAGMI::moveInstruction
If a subregister define is moved before a define of a different subregister
(but same base regsiter) and the other fine marks the remainder of the register
as undefined, the moved define will be overwritten by the undef. To prevent
this, move the `undef` to the moved instruction.
Fixes #130884.
---
llvm/lib/CodeGen/MachineScheduler.cpp | 25 +++++++++++++++++++
.../AMDGPU/shufflevector-physreg-copy.ll | 8 +++---
2 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 5086ee8829b25..4999e5af7f1e4 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -3979,6 +3979,31 @@ void GenericScheduler::reschedulePhysReg(SUnit *SU, bool isTop) {
continue;
LLVM_DEBUG(dbgs() << " Rescheduling physreg copy ";
DAG->dumpNode(*Dep.getSUnit()));
+
+ // Check to make sure that there are no subreg defintions of the given
+ // register between it's new and old location that are marked as undef. If
+ // so, mark the current instruction as undef instead.
+ SmallVector<MachineOperand*, 1> SubregDefs;
+ for (MachineOperand& MO : Copy->operands()) {
+ if (MO.isReg() && MO.isDef() && MO.getSubReg() != 0) {
+ SubregDefs.push_back(&MO);
+ }
+ }
+ if (SubregDefs.size()) {
+ for (auto CurrInst = InsertPos; CurrInst != Copy; ++CurrInst) {
+ for (MachineOperand& MO : CurrInst->operands()) {
+ if (MO.isReg() && MO.isDef() && MO.isUndef() && MO.getSubReg() != 0) {
+ for (auto *MISubregDef : SubregDefs) {
+ if (MISubregDef->getReg() == MO.getReg()) {
+ assert(!MISubregDef->isUndef() && "Register defined as undef twice.");
+ MO.setIsUndef(false);
+ MISubregDef->setIsUndef(true);
+ }
+ }
+ }
+ }
+ }
+ }
DAG->moveInstruction(Copy, InsertPos);
}
}
diff --git a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
index e130ce0a45467..71e0e5afa81a0 100644
--- a/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
+++ b/llvm/test/CodeGen/AMDGPU/shufflevector-physreg-copy.ll
@@ -734,10 +734,10 @@ define i32 @shufflevector_v4i32_3210_physreg_even_vgpr_quad_copy_other_use_elt(p
; GFX900-NEXT: ;;#ASMSTART
; GFX900-NEXT: ; def v4, v5, v6, v7
; GFX900-NEXT: ;;#ASMEND
-; GFX900-NEXT: v_mov_b32_e32 v9, v5
-; GFX900-NEXT: v_mov_b32_e32 v8, v6
-; GFX900-NEXT: v_mov_b32_e32 v10, v4
-; GFX900-NEXT: global_store_dwordx4 v0, v[7:10], s[16:17]
+; GFX900-NEXT: v_mov_b32_e32 v3, v5
+; GFX900-NEXT: v_mov_b32_e32 v2, v6
+; GFX900-NEXT: v_mov_b32_e32 v1, v7
+; GFX900-NEXT: global_store_dwordx4 v0, v[1:4], s[16:17]
; GFX900-NEXT: v_mov_b32_e32 v0, v6
; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]
More information about the llvm-commits
mailing list