[llvm] [SROA] Use SmallPtrSet for PromotableAllocas (PR #105809)
Bartłomiej Chmiel via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 23 02:56:56 PDT 2024
https://github.com/b-chmiel created https://github.com/llvm/llvm-project/pull/105809
When compiling large SystemVerilog designs transpiled by https://github.com/verilator/verilator, `clang` compilation hangs on during SROA phase.
The [PromotableAllocas](https://github.com/llvm/llvm-project/blob/57dc09341e5eef758b1abce78822c51069157869/llvm/lib/Transforms/Scalar/SROA.cpp#L201) field is represented as a `std::vector`. In our case this number is close to 500 000 000 which makes [random search-delete](https://github.com/llvm/llvm-project/blob/57dc09341e5eef758b1abce78822c51069157869/llvm/lib/Transforms/Scalar/SROA.cpp#L5615) on `std::vector` inefficient.
Assuming that PromotableAllocas contains only unique raw pointers to allocas, SmallPtrSet may be used for storing them.
Note: I'm creating `std::vector` from SmallPtrSet in `SROA::promoteAllocas` to match signature of `PromoteMem2Reg`. However, the `PromoteMem2Reg` constructor makes yet another copy of this structure (using its begin/end iterators). Do You think there is a better way to optimize this? For example using `std::move` and adjusting `PromoteMem2Reg` interface?
@chandlerc
## Benchmarks
### Internal benchmark
Base version timed out after 9 hours, improved version finished in 37 minutes.
### Minimal example benchmark
This test mimics our internal benchmark; creates a lot of allocas considered in SROA.
In 10 test runs, the improved version was 6% faster than the base one.
gen.cpp - generates a test file
```cpp
#include <fstream>
int main() {
constexpr int fields = 100000;
std::ofstream of{"out.cpp"};
of << "#include <random>\n";
of << " struct VlWide final {\n";
of << "\tstd::uint32_t m_storage[5];\n";
of << "};\n";
of << "int main() {\n";
of << "\tunsigned int rnd = rand();\n";
for (auto i = 0; i < fields; ++i)
of << "\tVlWide tmp_" << i << "{rnd};\n";
of << "\treturn 0;\n";
of << "}\n";
return 0;
}
```
Makefile - compiles both generate script and generated file
```
.PHONY = clean
out.o: out.cpp
$(CXX) -c -O1 -emit-llvm -mllvm -stats -o $@ $<
out.cpp: gen.o
./gen.o
gen.o: gen.cpp
$(CXX) -O3 -o $@ $<
clean:
- rm *.o out.cpp
```
Run with `make`.
### llvm-test-suite
CTMark `compile_time` results for base (commit https://github.com/llvm/llvm-project/commit/b05c55472bf7cadcd0e4cb1a669b3474695b0524) and improved `clang` versions of 100 runs:
```
Program compile_time
base improved diff
tramp3d-v4/tramp3d-v4 6.13 6.28 2.4%
sqlite3/sqlite3 1.21 1.24 2.1%
lencod/lencod 4.43 4.45 0.5%
SPASS/SPASS 5.84 5.87 0.4%
Bullet/bullet 27.64 27.58 -0.2%
consumer-typeset/consumer-typeset 4.37 4.34 -0.8%
7zip/7zip-benchmark 71.99 71.43 -0.8%
ClamAV/clamscan 5.42 5.38 -0.8%
kimwitu++/kc 11.64 11.52 -1.0%
mafft/pairlocalalign 2.31 2.27 -1.6%
Geomean difference 0.0%
compile_time
l/r base improved diff
count 10.000000 10.000000 10.000000
mean 14.099270 14.034550 0.000112
std 21.708482 21.533859 0.013314
min 1.211500 1.236500 -0.016101
25% 4.384250 4.364425 -0.007946
50% 5.632550 5.621750 -0.004951
75% 10.262475 10.211450 0.004309
max 71.992200 71.425300 0.024213
```
>From 83e73d70c16839869936c50842c8aeac680ec836 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bart=C5=82omiej=20Chmiel?= <bchmiel at antmicro.com>
Date: Thu, 22 Aug 2024 14:56:21 +0200
Subject: [PATCH] [SROA] Use SmallPtrSet for PromotableAllocas
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Optimize SROA pass for large number of allocas by
speeding-up PromotableAllocas erase operation. The optimization
involves using SmallPtrSet which proves to be efficient since
PromotableAllocas is used only for manipulating unique pointers.
Signed-off-by: Bartłomiej Chmiel <bchmiel at antmicro.com>
---
llvm/lib/Transforms/Scalar/SROA.cpp | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index 26b62cb79cdedf..e13dfed5adb458 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -198,7 +198,7 @@ class SROA {
SmallSetVector<AllocaInst *, 16> PostPromotionWorklist;
/// A collection of alloca instructions we can directly promote.
- std::vector<AllocaInst *> PromotableAllocas;
+ SmallPtrSet<AllocaInst *, 16> PromotableAllocas;
/// A worklist of PHIs to speculate prior to promoting allocas.
///
@@ -4769,9 +4769,8 @@ bool SROA::presplitLoadsAndStores(AllocaInst &AI, AllocaSlices &AS) {
// Finally, don't try to promote any allocas that new require re-splitting.
// They have already been added to the worklist above.
- llvm::erase_if(PromotableAllocas, [&](AllocaInst *AI) {
- return ResplitPromotableAllocas.count(AI);
- });
+ for (auto *RPA : ResplitPromotableAllocas)
+ PromotableAllocas.erase(RPA);
return true;
}
@@ -4933,7 +4932,7 @@ AllocaInst *SROA::rewritePartition(AllocaInst &AI, AllocaSlices &AS,
}
if (PHIUsers.empty() && SelectUsers.empty()) {
// Promote the alloca.
- PromotableAllocas.push_back(NewAI);
+ PromotableAllocas.insert(NewAI);
} else {
// If we have either PHIs or Selects to speculate, add them to those
// worklists and re-queue the new alloca so that we promote in on the
@@ -5568,7 +5567,9 @@ bool SROA::promoteAllocas(Function &F) {
LLVM_DEBUG(dbgs() << "Not promoting allocas with mem2reg!\n");
} else {
LLVM_DEBUG(dbgs() << "Promoting allocas with mem2reg...\n");
- PromoteMemToReg(PromotableAllocas, DTU->getDomTree(), AC);
+ PromoteMemToReg(
+ std::vector(PromotableAllocas.begin(), PromotableAllocas.end()),
+ DTU->getDomTree(), AC);
}
PromotableAllocas.clear();
@@ -5585,7 +5586,7 @@ std::pair<bool /*Changed*/, bool /*CFGChanged*/> SROA::runSROA(Function &F) {
if (AllocaInst *AI = dyn_cast<AllocaInst>(I)) {
if (DL.getTypeAllocSize(AI->getAllocatedType()).isScalable() &&
isAllocaPromotable(AI))
- PromotableAllocas.push_back(AI);
+ PromotableAllocas.insert(AI);
else
Worklist.insert(AI);
}
@@ -5609,10 +5610,10 @@ std::pair<bool /*Changed*/, bool /*CFGChanged*/> SROA::runSROA(Function &F) {
// Remove the deleted allocas from various lists so that we don't try to
// continue processing them.
if (!DeletedAllocas.empty()) {
- auto IsInSet = [&](AllocaInst *AI) { return DeletedAllocas.count(AI); };
- Worklist.remove_if(IsInSet);
- PostPromotionWorklist.remove_if(IsInSet);
- llvm::erase_if(PromotableAllocas, IsInSet);
+ Worklist.set_subtract(DeletedAllocas);
+ PostPromotionWorklist.set_subtract(DeletedAllocas);
+ for (auto *DA : DeletedAllocas)
+ PromotableAllocas.erase(DA);
DeletedAllocas.clear();
}
}
More information about the llvm-commits
mailing list