[llvm] [RFC][MC] Cache MCRegAliasIterator (PR #93510)
Pierre van Houtryve via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 3 06:44:49 PDT 2024
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/93510
>From d09e270372ae59180d4edaea24c9b0bd0a1684f2 Mon Sep 17 00:00:00 2001
From: pvanhout <pierre.vanhoutryve at amd.com>
Date: Tue, 28 May 2024 09:18:32 +0200
Subject: [PATCH 1/2] [RFC][MC] Cache MCRegAliasIterator
AMDGPU has a lot of registers, almost 9000. Many of those registers have aliases. For instance, SGPR0 has a ton of aliases due to the presence of register tuples. It's even worse if you query the aliases of a register tuple itself. A large register tuple can have hundreds of aliases because it may include 16 registers, and each of those registers have their own tuples as well.
The current implementation of MCRegAliasIterator is not good at this. In some extreme cases it can iterate (IIRC), 7000 more times than necessary, just giving duplicates over and over again and using a lot of expensive iterators.
This patch implements a cache system for MCRegAliasIterator. It does the expensive part only once and then saves it for us so the next iterations on that register's aliases are just a map lookup.
Furthermore, the cached data is uniqued (and sorted). Thus, this speeds up code by both speeding up the iterator itself, but also by minimizing the number of loop iterations users of the iterator do.
I think this is beneficial for AMDGPU (and some quick profiling showed a few seconds gained on a +-3 minute test build), but may introduce some overhead on smaller targets where this iterator was alreeady cheap. To avoid that, we might consider disabling the cache on targets with a small amount of registers. We could pretty much restrict this cache to AMDGPU only by only enabling it if NumReg is above a certain threshold, like 1000.
I still need to do memory usage measurements, but I think this cache should take at most 10MB on AMDGPU if we were to query the aliases of all registers (which isn't common I think). I don't think this is a concern but if it is, we can always find ways to alleviate this, for instance by only caching after a certain number of hits or by clearing the cache at regular intervals in the backend.
---
llvm/include/llvm/MC/MCRegisterInfo.h | 114 ++++++++++++++----
llvm/lib/MC/MCRegisterInfo.cpp | 20 +++
.../CodeGen/ARM/constant-island-movwt.mir | 14 +--
3 files changed, 117 insertions(+), 31 deletions(-)
diff --git a/llvm/include/llvm/MC/MCRegisterInfo.h b/llvm/include/llvm/MC/MCRegisterInfo.h
index af5be9186108a..79df17d33e420 100644
--- a/llvm/include/llvm/MC/MCRegisterInfo.h
+++ b/llvm/include/llvm/MC/MCRegisterInfo.h
@@ -187,6 +187,9 @@ class MCRegisterInfo {
DenseMap<MCRegister, int> L2SEHRegs; // LLVM to SEH regs mapping
DenseMap<MCRegister, int> L2CVRegs; // LLVM to CV regs mapping
+ mutable DenseMap<MCPhysReg, std::vector<MCPhysReg>> RegAliasesCache;
+ ArrayRef<MCPhysReg> getCachedAliasesOf(MCPhysReg R) const;
+
/// Iterator class that can traverse the differentially encoded values in
/// DiffLists. Don't use this class directly, use one of the adaptors below.
class DiffListIterator
@@ -263,6 +266,7 @@ class MCRegisterInfo {
friend class MCRegUnitIterator;
friend class MCRegUnitMaskIterator;
friend class MCRegUnitRootIterator;
+ friend class MCRegAliasIterator;
/// Initialize MCRegisterInfo, called by TableGen
/// auto-generated routines. *DO NOT USE*.
@@ -726,60 +730,122 @@ class MCRegUnitRootIterator {
/// MCRegAliasIterator enumerates all registers aliasing Reg. If IncludeSelf is
/// set, Reg itself is included in the list. This iterator does not guarantee
/// any ordering or that entries are unique.
+///
+/// This iterator can work in two modes: cached and uncached.
+///
+/// In Uncached mode, this uses RegUnit/RegUnitRoot/SuperReg iterators to
+/// find all aliases. This is very expensive if the target (such as AMDGPU)
+/// has a lot of register and register aliases. It can also cause us
+/// to iterate the same register many times over.
+///
+/// In cached mode, the iterator requests MCRegisterInfo for a cache.
+/// MCRegisterInfo then returns a cached list of sorted, uniqued
+/// aliases for that register. This allows the iterator to be faster
+/// past the first request, but also to iterate much less in some
+/// cases, giving a slight speedup to any code that's using it.
class MCRegAliasIterator {
private:
MCRegister Reg;
const MCRegisterInfo *MCRI;
- bool IncludeSelf;
+ bool IncludeSelf : 1;
+ bool IsCached : 1;
+
+ /// TODO: Should we dynamically allocate this? If we don't make caching
+ /// specific for each target, we probably should in order to keep the size of
+ /// this object under control. RegAliasIterator is currently 80 bytes.
+ struct UncachedData {
+ MCRegUnitIterator RI;
+ MCRegUnitRootIterator RRI;
+ MCSuperRegIterator SI;
+ };
- MCRegUnitIterator RI;
- MCRegUnitRootIterator RRI;
- MCSuperRegIterator SI;
+ struct CachedData {
+ ArrayRef<MCPhysReg> Aliases;
+ // Index into Aliases. The actual index is (Idx - IncludeSelf) because, when
+ // IncludeSelf is used, we use zero to represent self.
+ unsigned Idx;
+ };
+
+ union {
+ UncachedData Iters;
+ CachedData Cache;
+ };
public:
MCRegAliasIterator(MCRegister Reg, const MCRegisterInfo *MCRI,
- bool IncludeSelf)
- : Reg(Reg), MCRI(MCRI), IncludeSelf(IncludeSelf) {
+ bool IncludeSelf, bool UseCache = true)
+ : Reg(Reg), MCRI(MCRI), IncludeSelf(IncludeSelf) {
+
+ IsCached = UseCache;
+ if (IsCached) {
+ Cache.Aliases = MCRI->getCachedAliasesOf(Reg);
+ Cache.Idx = 0;
+ return;
+ }
+
// Initialize the iterators.
- for (RI = MCRegUnitIterator(Reg, MCRI); RI.isValid(); ++RI) {
- for (RRI = MCRegUnitRootIterator(*RI, MCRI); RRI.isValid(); ++RRI) {
- for (SI = MCSuperRegIterator(*RRI, MCRI, true); SI.isValid(); ++SI) {
- if (!(!IncludeSelf && Reg == *SI))
+ for (Iters.RI = MCRegUnitIterator(Reg, MCRI); Iters.RI.isValid();
+ ++Iters.RI) {
+ for (Iters.RRI = MCRegUnitRootIterator(*Iters.RI, MCRI);
+ Iters.RRI.isValid(); ++Iters.RRI) {
+ for (Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
+ Iters.SI.isValid(); ++Iters.SI) {
+ if (!(!IncludeSelf && Reg == *Iters.SI))
return;
}
}
}
}
- bool isValid() const { return RI.isValid(); }
+ bool isValid() const {
+ return IsCached ? (Cache.Idx < (Cache.Aliases.size() + IncludeSelf))
+ : Iters.RI.isValid();
+ }
MCRegister operator*() const {
- assert(SI.isValid() && "Cannot dereference an invalid iterator.");
- return *SI;
+ if (IsCached) {
+ if (IncludeSelf && (Cache.Idx == 0))
+ return Reg;
+ return Cache.Aliases[Cache.Idx - IncludeSelf];
+ }
+
+ assert(Iters.SI.isValid() && "Cannot dereference an invalid iterator.");
+ return *Iters.SI;
}
void advance() {
+ if (IsCached) {
+ ++Cache.Idx;
+ return;
+ }
+
// Assuming SI is valid.
- ++SI;
- if (SI.isValid()) return;
+ ++Iters.SI;
+ if (Iters.SI.isValid())
+ return;
- ++RRI;
- if (RRI.isValid()) {
- SI = MCSuperRegIterator(*RRI, MCRI, true);
+ ++Iters.RRI;
+ if (Iters.RRI.isValid()) {
+ Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
return;
}
- ++RI;
- if (RI.isValid()) {
- RRI = MCRegUnitRootIterator(*RI, MCRI);
- SI = MCSuperRegIterator(*RRI, MCRI, true);
+ ++Iters.RI;
+ if (Iters.RI.isValid()) {
+ Iters.RRI = MCRegUnitRootIterator(*Iters.RI, MCRI);
+ Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
}
}
MCRegAliasIterator &operator++() {
assert(isValid() && "Cannot move off the end of the list.");
- do advance();
- while (!IncludeSelf && isValid() && *SI == Reg);
+ if (IsCached)
+ advance();
+ else {
+ do
+ advance();
+ while (!IncludeSelf && isValid() && *Iters.SI == Reg);
+ }
return *this;
}
};
diff --git a/llvm/lib/MC/MCRegisterInfo.cpp b/llvm/lib/MC/MCRegisterInfo.cpp
index 334655616d8db..4ce9eb40547be 100644
--- a/llvm/lib/MC/MCRegisterInfo.cpp
+++ b/llvm/lib/MC/MCRegisterInfo.cpp
@@ -20,6 +20,26 @@
using namespace llvm;
+ArrayRef<MCPhysReg> MCRegisterInfo::getCachedAliasesOf(MCPhysReg R) const {
+ if (auto It = RegAliasesCache.find(R); It != RegAliasesCache.end())
+ return It->second;
+
+ // TODO: Should we have a DenseSet instead & then convert it
+ // to vector? Or even a BitVector that's then converted to a normal
+ // MCPhysReg vector?
+ auto &Aliases = RegAliasesCache[R];
+ for (MCRegAliasIterator It(R, this, /*IncludeSelf=*/false,
+ /*UseCache=*/false);
+ It.isValid(); ++It) {
+ Aliases.push_back(*It);
+ }
+
+ llvm::sort(Aliases);
+ Aliases.erase(unique(Aliases), Aliases.end());
+ Aliases.shrink_to_fit();
+ return Aliases;
+}
+
MCRegister
MCRegisterInfo::getMatchingSuperReg(MCRegister Reg, unsigned SubIdx,
const MCRegisterClass *RC) const {
diff --git a/llvm/test/CodeGen/ARM/constant-island-movwt.mir b/llvm/test/CodeGen/ARM/constant-island-movwt.mir
index 7d21a4e4875c3..16d2c478cf7ee 100644
--- a/llvm/test/CodeGen/ARM/constant-island-movwt.mir
+++ b/llvm/test/CodeGen/ARM/constant-island-movwt.mir
@@ -898,13 +898,13 @@ body: |
# CHECK-NEXT: CONSTPOOL_ENTRY 1, %const.0, 4
# CHECK-NEXT: {{^ $}}
# CHECK-NEXT: bb.2.entry (align 2):
-# CHECK-NEXT: liveins: $d13, $s27, $r10, $r9, $r8, $s26, $d12, $s25, $s24,
-# CHECK-SAME: $d15, $s30, $s31, $d14, $s28, $s29, $lr, $r0, $d21,
-# CHECK-SAME: $r3, $q10, $d20, $d17, $r2, $d25, $q11, $d22, $d23,
-# CHECK-SAME: $r1, $q8, $d16, $s3, $q14, $d28, $d29, $d19, $s17,
-# CHECK-SAME: $d8, $s16, $r6, $r7, $r4, $q12, $q9, $d18, $s0, $q15,
-# CHECK-SAME: $d30, $d31, $r12, $s1, $d0, $d24, $s2, $d1, $q0, $s6,
-# CHECK-SAME: $d3, $d2, $s4, $q1, $s7, $s5, $d9, $s18, $s19, $q4
+# CHECK-NEXT: liveins: $d13, $s26, $r10, $r9, $r8, $s27, $d12, $s24, $s25,
+# CHECK-SAME: $d15, $s30, $s31, $d14, $s28, $s29, $lr, $d21, $r0,
+# CHECK-SAME: $q10, $r7, $d20, $d17, $r2, $d25, $q11, $d22, $d23,
+# CHECK-SAME: $s1, $q8, $d16, $s3, $q14, $d28, $d29, $d19, $s16,
+# CHECK-SAME: $r4, $d8, $r6, $r3, $q12, $s17, $q9, $d18, $s0, $d31,
+# CHECK-SAME: $q15, $d30, $r1, $d0, $r12, $d24, $s2, $d1, $q0, $s7,
+# CHECK-SAME: $s4, $d2, $s6, $q1, $s5, $d3, $d9, $s18, $s19, $q4
# CHECK-NEXT: {{^ $}}
# CHECK-NEXT: $r5 = t2MOVi16 target-flags(arm-lo16) @.str.84, 14 /* CC::al */, $noreg
# CHECK-NEXT: $r5 = t2MOVTi16 $r5, target-flags(arm-hi16) @.str.84, 14 /* CC::al */, $noreg
>From 073e1a9b4545aff1eb465e27a6e6a89862e37e03 Mon Sep 17 00:00:00 2001
From: pvanhout <pierre.vanhoutryve at amd.com>
Date: Mon, 3 Jun 2024 15:22:50 +0200
Subject: [PATCH 2/2] refactor impl
---
llvm/include/llvm/MC/MCRegisterInfo.h | 121 ++----------------
llvm/lib/MC/MCRegisterInfo.cpp | 73 ++++++++++-
.../CodeGen/ARM/constant-island-movwt.mir | 15 ++-
3 files changed, 87 insertions(+), 122 deletions(-)
diff --git a/llvm/include/llvm/MC/MCRegisterInfo.h b/llvm/include/llvm/MC/MCRegisterInfo.h
index 79df17d33e420..42419cd2b18be 100644
--- a/llvm/include/llvm/MC/MCRegisterInfo.h
+++ b/llvm/include/llvm/MC/MCRegisterInfo.h
@@ -727,125 +727,30 @@ class MCRegUnitRootIterator {
}
};
-/// MCRegAliasIterator enumerates all registers aliasing Reg. If IncludeSelf is
-/// set, Reg itself is included in the list. This iterator does not guarantee
-/// any ordering or that entries are unique.
-///
-/// This iterator can work in two modes: cached and uncached.
-///
-/// In Uncached mode, this uses RegUnit/RegUnitRoot/SuperReg iterators to
-/// find all aliases. This is very expensive if the target (such as AMDGPU)
-/// has a lot of register and register aliases. It can also cause us
-/// to iterate the same register many times over.
-///
-/// In cached mode, the iterator requests MCRegisterInfo for a cache.
-/// MCRegisterInfo then returns a cached list of sorted, uniqued
-/// aliases for that register. This allows the iterator to be faster
-/// past the first request, but also to iterate much less in some
-/// cases, giving a slight speedup to any code that's using it.
+/// MCRegAliasIterator enumerates all registers aliasing Reg.
class MCRegAliasIterator {
private:
- MCRegister Reg;
- const MCRegisterInfo *MCRI;
- bool IncludeSelf : 1;
- bool IsCached : 1;
-
- /// TODO: Should we dynamically allocate this? If we don't make caching
- /// specific for each target, we probably should in order to keep the size of
- /// this object under control. RegAliasIterator is currently 80 bytes.
- struct UncachedData {
- MCRegUnitIterator RI;
- MCRegUnitRootIterator RRI;
- MCSuperRegIterator SI;
- };
-
- struct CachedData {
- ArrayRef<MCPhysReg> Aliases;
- // Index into Aliases. The actual index is (Idx - IncludeSelf) because, when
- // IncludeSelf is used, we use zero to represent self.
- unsigned Idx;
- };
-
- union {
- UncachedData Iters;
- CachedData Cache;
- };
+ const MCPhysReg *It = nullptr;
+ const MCPhysReg *End = nullptr;
public:
MCRegAliasIterator(MCRegister Reg, const MCRegisterInfo *MCRI,
- bool IncludeSelf, bool UseCache = true)
- : Reg(Reg), MCRI(MCRI), IncludeSelf(IncludeSelf) {
-
- IsCached = UseCache;
- if (IsCached) {
- Cache.Aliases = MCRI->getCachedAliasesOf(Reg);
- Cache.Idx = 0;
- return;
- }
-
- // Initialize the iterators.
- for (Iters.RI = MCRegUnitIterator(Reg, MCRI); Iters.RI.isValid();
- ++Iters.RI) {
- for (Iters.RRI = MCRegUnitRootIterator(*Iters.RI, MCRI);
- Iters.RRI.isValid(); ++Iters.RRI) {
- for (Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
- Iters.SI.isValid(); ++Iters.SI) {
- if (!(!IncludeSelf && Reg == *Iters.SI))
- return;
- }
- }
- }
- }
-
- bool isValid() const {
- return IsCached ? (Cache.Idx < (Cache.Aliases.size() + IncludeSelf))
- : Iters.RI.isValid();
- }
-
- MCRegister operator*() const {
- if (IsCached) {
- if (IncludeSelf && (Cache.Idx == 0))
- return Reg;
- return Cache.Aliases[Cache.Idx - IncludeSelf];
- }
-
- assert(Iters.SI.isValid() && "Cannot dereference an invalid iterator.");
- return *Iters.SI;
+ bool IncludeSelf) {
+ ArrayRef<MCPhysReg> Cache = MCRI->getCachedAliasesOf(Reg);
+ assert(Cache.back() == Reg);
+ It = Cache.begin();
+ End = Cache.end();
+ if (!IncludeSelf)
+ --End;
}
- void advance() {
- if (IsCached) {
- ++Cache.Idx;
- return;
- }
+ bool isValid() const { return It != End; }
- // Assuming SI is valid.
- ++Iters.SI;
- if (Iters.SI.isValid())
- return;
-
- ++Iters.RRI;
- if (Iters.RRI.isValid()) {
- Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
- return;
- }
-
- ++Iters.RI;
- if (Iters.RI.isValid()) {
- Iters.RRI = MCRegUnitRootIterator(*Iters.RI, MCRI);
- Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
- }
- }
+ MCRegister operator*() const { return *It; }
MCRegAliasIterator &operator++() {
assert(isValid() && "Cannot move off the end of the list.");
- if (IsCached)
- advance();
- else {
- do
- advance();
- while (!IncludeSelf && isValid() && *Iters.SI == Reg);
- }
+ ++It;
return *this;
}
};
diff --git a/llvm/lib/MC/MCRegisterInfo.cpp b/llvm/lib/MC/MCRegisterInfo.cpp
index 4ce9eb40547be..c71b9e33b15e1 100644
--- a/llvm/lib/MC/MCRegisterInfo.cpp
+++ b/llvm/lib/MC/MCRegisterInfo.cpp
@@ -20,22 +20,81 @@
using namespace llvm;
+namespace {
+/// MCRegAliasIterator enumerates all registers aliasing Reg. This iterator
+/// does not guarantee any ordering or that entries are unique.
+class MCRegAliasIteratorImpl {
+private:
+ MCRegister Reg;
+ const MCRegisterInfo *MCRI;
+
+ MCRegUnitIterator RI;
+ MCRegUnitRootIterator RRI;
+ MCSuperRegIterator SI;
+
+public:
+ MCRegAliasIteratorImpl(MCRegister Reg, const MCRegisterInfo *MCRI)
+ : Reg(Reg), MCRI(MCRI) {
+
+ // Initialize the iterators.
+ for (RI = MCRegUnitIterator(Reg, MCRI); RI.isValid(); ++RI) {
+ for (RRI = MCRegUnitRootIterator(*RI, MCRI); RRI.isValid(); ++RRI) {
+ for (SI = MCSuperRegIterator(*RRI, MCRI, true); SI.isValid(); ++SI) {
+ if (Reg != *SI)
+ return;
+ }
+ }
+ }
+ }
+
+ bool isValid() const { return RI.isValid(); }
+
+ MCRegister operator*() const {
+ assert(SI.isValid() && "Cannot dereference an invalid iterator.");
+ return *SI;
+ }
+
+ void advance() {
+ // Assuming SI is valid.
+ ++SI;
+ if (SI.isValid())
+ return;
+
+ ++RRI;
+ if (RRI.isValid()) {
+ SI = MCSuperRegIterator(*RRI, MCRI, true);
+ return;
+ }
+
+ ++RI;
+ if (RI.isValid()) {
+ RRI = MCRegUnitRootIterator(*RI, MCRI);
+ SI = MCSuperRegIterator(*RRI, MCRI, true);
+ }
+ }
+
+ MCRegAliasIteratorImpl &operator++() {
+ assert(isValid() && "Cannot move off the end of the list.");
+ do
+ advance();
+ while (isValid() && *SI == Reg);
+ return *this;
+ }
+};
+} // namespace
+
ArrayRef<MCPhysReg> MCRegisterInfo::getCachedAliasesOf(MCPhysReg R) const {
if (auto It = RegAliasesCache.find(R); It != RegAliasesCache.end())
return It->second;
- // TODO: Should we have a DenseSet instead & then convert it
- // to vector? Or even a BitVector that's then converted to a normal
- // MCPhysReg vector?
auto &Aliases = RegAliasesCache[R];
- for (MCRegAliasIterator It(R, this, /*IncludeSelf=*/false,
- /*UseCache=*/false);
- It.isValid(); ++It) {
+ for (MCRegAliasIteratorImpl It(R, this); It.isValid(); ++It)
Aliases.push_back(*It);
- }
llvm::sort(Aliases);
Aliases.erase(unique(Aliases), Aliases.end());
+ // Always put "self" at the end, so the iterator can choose to ignore it.
+ Aliases.push_back(R);
Aliases.shrink_to_fit();
return Aliases;
}
diff --git a/llvm/test/CodeGen/ARM/constant-island-movwt.mir b/llvm/test/CodeGen/ARM/constant-island-movwt.mir
index 16d2c478cf7ee..7b3e59eca8472 100644
--- a/llvm/test/CodeGen/ARM/constant-island-movwt.mir
+++ b/llvm/test/CodeGen/ARM/constant-island-movwt.mir
@@ -898,13 +898,14 @@ body: |
# CHECK-NEXT: CONSTPOOL_ENTRY 1, %const.0, 4
# CHECK-NEXT: {{^ $}}
# CHECK-NEXT: bb.2.entry (align 2):
-# CHECK-NEXT: liveins: $d13, $s26, $r10, $r9, $r8, $s27, $d12, $s24, $s25,
-# CHECK-SAME: $d15, $s30, $s31, $d14, $s28, $s29, $lr, $d21, $r0,
-# CHECK-SAME: $q10, $r7, $d20, $d17, $r2, $d25, $q11, $d22, $d23,
-# CHECK-SAME: $s1, $q8, $d16, $s3, $q14, $d28, $d29, $d19, $s16,
-# CHECK-SAME: $r4, $d8, $r6, $r3, $q12, $s17, $q9, $d18, $s0, $d31,
-# CHECK-SAME: $q15, $d30, $r1, $d0, $r12, $d24, $s2, $d1, $q0, $s7,
-# CHECK-SAME: $s4, $d2, $s6, $q1, $s5, $d3, $d9, $s18, $s19, $q4
+# CHECK-NEXT: liveins: $s26, $s27, $r10, $r9, $r8, $d13, $s24, $s25,
+# CHECK-SAME: $d12, $d15, $s30, $s31, $d14, $s28, $s29, $lr,
+# CHECK-SAME: $d21, $q10, $r7, $r0, $d20, $d17, $r2, $q12,
+# CHECK-SAME: $q11, $d22, $d23, $r1, $q8, $d16, $d30, $q14,
+# CHECK-SAME: $d28, $d29, $d19, $s17, $r4, $d8, $r6, $r3,
+# CHECK-SAME: $s16, $d25, $q9, $d18, $s0, $d31, $s3, $q15,
+# CHECK-SAME: $r12, $d0, $s1, $d24, $d1, $s2, $q0, $s5, $d2,
+# CHECK-SAME: $q1, $s4, $s7, $d3, $s6, $d9, $s18, $s19, $q4
# CHECK-NEXT: {{^ $}}
# CHECK-NEXT: $r5 = t2MOVi16 target-flags(arm-lo16) @.str.84, 14 /* CC::al */, $noreg
# CHECK-NEXT: $r5 = t2MOVTi16 $r5, target-flags(arm-hi16) @.str.84, 14 /* CC::al */, $noreg
More information about the llvm-commits
mailing list