[llvm] 7f1a744 - [TailDup][MachineSSAUpdater] Let RewriteUse insert a COPY when needed (#95553)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 25 01:05:19 PDT 2024
Author: Björn Pettersson
Date: 2024-06-25T10:05:14+02:00
New Revision: 7f1a74429dfd62a410d4b51d2e75d3677429a51a
URL: https://github.com/llvm/llvm-project/commit/7f1a74429dfd62a410d4b51d2e75d3677429a51a
DIFF: https://github.com/llvm/llvm-project/commit/7f1a74429dfd62a410d4b51d2e75d3677429a51a.diff
LOG: [TailDup][MachineSSAUpdater] Let RewriteUse insert a COPY when needed (#95553)
When running early-tailduplication we've seen problems with machine
verifier errors due to register class mismatches after doing the machine
SSA updates.
Typical scenario is that there is a PHI node and another instruction
that is using the same vreg:
%othervreg:otherclass = PHI %vreg:origclass, %bb
MInstr %vreg:origclass
but then after TailDuplicator::tailDuplicateAndUpdate we get
%othervreg:otherclass = PHI %vreg:origclass, %bb, ...
MInstr %othervreg:otherclass
Such rewrites are only valid if 'otherclass' is equal to (or a subclass
of) 'origclass'.
The solution here is based on adding a COPY instruction to make sure we
satisfy constraints given by 'MInstr' in the example. So if 'otherclass'
isn't equal to (or a subclass of) 'origclass' we insert a copy after the
PHI like this:
%othervreg:otherclass = PHI %vreg:origclass, %bb, ...
%newvreg:origclass = COPY %othervreg:otherclass
MInstr %newvreg:origclass
A special case is when it is possible to constrain the register class
instead of inserting a COPY. We currently prefer to constrain the
register class instead of inserting a COPY, even if it is a bit unclear
if that always is better (considering register pressure for the
constrained class etc.).
Fixes: https://github.com/llvm/llvm-project/issues/62712
Added:
llvm/test/CodeGen/AArch64/taildup-ssa-update-pr62712.mir
Modified:
llvm/lib/CodeGen/MachineSSAUpdater.cpp
Removed:
################################################################################
diff --git a/llvm/lib/CodeGen/MachineSSAUpdater.cpp b/llvm/lib/CodeGen/MachineSSAUpdater.cpp
index 8d6ea9c488da5..5b21c5653b46b 100644
--- a/llvm/lib/CodeGen/MachineSSAUpdater.cpp
+++ b/llvm/lib/CodeGen/MachineSSAUpdater.cpp
@@ -209,7 +209,7 @@ Register MachineSSAUpdater::GetValueInMiddleOfBlock(MachineBasicBlock *BB,
// If the client wants to know about all new instructions, tell it.
if (InsertedPHIs) InsertedPHIs->push_back(InsertedPHI);
- LLVM_DEBUG(dbgs() << " Inserted PHI: " << *InsertedPHI << "\n");
+ LLVM_DEBUG(dbgs() << " Inserted PHI: " << *InsertedPHI);
return InsertedPHI.getReg(0);
}
@@ -236,6 +236,22 @@ void MachineSSAUpdater::RewriteUse(MachineOperand &U) {
NewVR = GetValueInMiddleOfBlock(UseMI->getParent());
}
+ // Insert a COPY if needed to satisfy register class constraints for the using
+ // MO. Or, if possible, just constrain the class for NewVR to avoid the need
+ // for a COPY.
+ if (NewVR) {
+ const TargetRegisterClass *UseRC =
+ dyn_cast_or_null<const TargetRegisterClass *>(RegAttrs.RCOrRB);
+ if (UseRC && !MRI->constrainRegClass(NewVR, UseRC)) {
+ MachineBasicBlock *UseBB = UseMI->getParent();
+ MachineInstr *InsertedCopy =
+ InsertNewDef(TargetOpcode::COPY, UseBB, UseBB->getFirstNonPHI(),
+ RegAttrs, MRI, TII)
+ .addReg(NewVR);
+ NewVR = InsertedCopy->getOperand(0).getReg();
+ LLVM_DEBUG(dbgs() << " Inserted COPY: " << *InsertedCopy);
+ }
+ }
U.setReg(NewVR);
}
diff --git a/llvm/test/CodeGen/AArch64/taildup-ssa-update-pr62712.mir b/llvm/test/CodeGen/AArch64/taildup-ssa-update-pr62712.mir
new file mode 100755
index 0000000000000..abe1813cb2f0e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/taildup-ssa-update-pr62712.mir
@@ -0,0 +1,302 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -verify-machineinstrs -mtriple aarch64-none-linux-gnu -run-pass early-tailduplication -o - %s | FileCheck %s
+
+# These tests used to hit verification failure due to register class
+# mismatches after tail dupilication (and the SSA form updates).
+
+# In test1 we have a simple case when the COPY already is duplicated, but
+# TailDuplication will try to eliminate the PHI in bb.2 by adding a new PHI in
+# bb.4. The presence of a PHI node in bb.4, which happens to assign to a
+# fpr32 register, was enough to mess up the result. The PHI node was reused
+# and the use of %2 in the CBNZW was changed into using %3 instead. But the
+# register class of %3 is not correct for CBNZW.
+# The fix involves adding a COPY instruction that moves the value to
+# a register of correct regclass.
+---
+name: test1
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test1
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.3(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: B %bb.3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: successors: %bb.5(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:fpr32 = PHI %0, %bb.0, %1, %bb.1
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32 = COPY [[PHI]]
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit [[PHI]]
+ ; CHECK-NEXT: CBNZW [[COPY]], %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ bb.0:
+ $x0 = COPY undef $x0, implicit-def %0:gpr32
+ B %bb.2
+
+ bb.1:
+ $x0 = COPY undef $x0, implicit-def %1:gpr32
+
+ bb.2:
+ %2:gpr32 = PHI %1, %bb.1, %0, %bb.0
+ B %bb.4
+
+ bb.3:
+ B %bb.3
+
+ bb.4:
+ %3:fpr32 = PHI %2, %bb.2
+ $x0 = COPY undef $x0, implicit %3:fpr32
+ CBNZW %2, %bb.5
+
+ bb.5:
+
+...
+
+# In test2 there are two PHIs already present, one with the wanted register
+# class. No idea if this is a common scenario in reality (hand written mir
+# test case).
+# FIXME: Can we pick the best PHI directly instead of getting a COPY from the
+# one with wrong register class?
+---
+name: test2
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test2
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.3(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: B %bb.3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: successors: %bb.5(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:fpr32 = PHI %0, %bb.0, %1, %bb.1
+ ; CHECK-NEXT: [[PHI1:%[0-9]+]]:gpr32 = PHI %0, %bb.0, %1, %bb.1
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32 = COPY [[PHI]]
+ ; CHECK-NEXT: CBNZW [[COPY]], %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit [[PHI]]
+ bb.0:
+ $x0 = COPY undef $x0, implicit-def %0:gpr32
+ B %bb.2
+
+ bb.1:
+ $x0 = COPY undef $x0, implicit-def %1:gpr32
+
+ bb.2:
+ %2:gpr32 = PHI %1, %bb.1, %0, %bb.0
+ B %bb.4
+
+ bb.3:
+ B %bb.3
+
+ bb.4:
+ %3:fpr32 = PHI %2, %bb.2
+ %4:gpr32 = PHI %2, %bb.2
+ CBNZW %2, %bb.5
+
+ bb.5:
+ $x0 = COPY undef $x0, implicit %3:fpr32
+...
+
+# In test3 we have multiple uses, and in multiple BBs. This test is to show
+# that we get one COPY instruction inserted for each BB where there is a use.
+---
+name: test3
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test3
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.3(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: B %bb.3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: successors: %bb.5(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:fpr32 = PHI %0, %bb.0, %1, %bb.1
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit [[PHI]]
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ ; CHECK-NEXT: successors: %bb.6(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32 = COPY [[PHI]]
+ ; CHECK-NEXT: CBNZW [[COPY]], %bb.6
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.6:
+ ; CHECK-NEXT: successors: %bb.7(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32 = COPY [[PHI]]
+ ; CHECK-NEXT: CBNZW [[COPY1]], %bb.7
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.7:
+ bb.0:
+ $x0 = COPY undef $x0, implicit-def %0:gpr32
+ B %bb.2
+
+ bb.1:
+ $x0 = COPY undef $x0, implicit-def %1:gpr32
+
+ bb.2:
+ %2:gpr32 = PHI %1, %bb.1, %0, %bb.0
+ B %bb.4
+
+ bb.3:
+ B %bb.3
+
+ bb.4:
+ %3:fpr32 = PHI %2, %bb.2
+ $x0 = COPY undef $x0, implicit %3:fpr32
+
+ bb.5:
+ CBNZW %2, %bb.6
+
+ bb.6:
+ CBNZW %2, %bb.7
+
+ bb.7:
+...
+
+# In test4 we do not need to insert a COPY.
+# The register class can be constrained instead.
+---
+name: test4
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test4
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.3(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: B %bb.3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: successors: %bb.5(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr32common = PHI %0, %bb.0, %1, %bb.1
+ ; CHECK-NEXT: CBNZW [[PHI]], %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ bb.0:
+ $x0 = COPY undef $x0, implicit-def %0:gpr32
+ B %bb.2
+
+ bb.1:
+ $x0 = COPY undef $x0, implicit-def %1:gpr32
+
+ bb.2:
+ %2:gpr32 = PHI %1, %bb.1, %0, %bb.0
+ B %bb.4
+
+ bb.3:
+ B %bb.3
+
+ bb.4:
+ %3:gpr32sp = PHI %2, %bb.2
+ CBNZW %2, %bb.5
+
+ bb.5:
+
+...
+
+# In test5 we do not need to insert a COPY.
+# The register class can be constrained instead.
+---
+name: test5
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test5
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.4(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.3(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: B %bb.3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: successors: %bb.5(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr32common = PHI %0, %bb.0, %1, %bb.1
+ ; CHECK-NEXT: CBNZW [[PHI]], %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ bb.0:
+ $x0 = COPY undef $x0, implicit-def %0:gpr32common
+ B %bb.2
+
+ bb.1:
+ $x0 = COPY undef $x0, implicit-def %1:gpr32common
+
+ bb.2:
+ %2:gpr32common = PHI %1, %bb.1, %0, %bb.0
+ B %bb.4
+
+ bb.3:
+ B %bb.3
+
+ bb.4:
+ %3:gpr32 = PHI %2, %bb.2
+ CBNZW %2, %bb.5
+
+ bb.5:
+...
More information about the llvm-commits
mailing list