[libcxx-commits] [libcxx] Speed up compilation of common uses of std::visit() (PR #164196)
via libcxx-commits
libcxx-commits at lists.llvm.org
Fri Nov 28 08:32:13 PST 2025
higher-performance wrote:
Done.
First, re: the runtime benchmarks, I had to run them a bit ad-hoc via googlebenchmark since I don't have the official setup handy, but regardless -- they actually indicate a speedup for < 8 elements:
Before:
```
Benchmark Time(ns) CPU(ns) Iterations
BM_Visit<1, 1>_mean 2.13 2.13 25000000
BM_Visit<1, 2>_mean 3.22 3.22 25000000
BM_Visit<1, 3>_mean 3.20 3.20 25000000
BM_Visit<1, 4>_mean 3.21 3.21 25000000
BM_Visit<1, 5>_mean 3.21 3.20 25000000
BM_Visit<1, 6>_mean 3.22 3.22 25000000
BM_Visit<1, 7>_mean 3.20 3.20 25000000
BM_Visit<1, 8>_mean 3.21 3.21 25000000
```
After:
```
Benchmark Time(ns) CPU(ns) Iterations
BM_Visit<1, 1>_mean 2.19 2.19 25000000
BM_Visit<1, 2>_mean 2.20 2.20 25000000
BM_Visit<1, 3>_mean 2.18 2.18 25000000
BM_Visit<1, 4>_mean 2.18 2.18 25000000
BM_Visit<1, 5>_mean 2.22 2.22 25000000
BM_Visit<1, 6>_mean 2.19 2.19 25000000
BM_Visit<1, 7>_mean 2.19 2.19 25000000
BM_Visit<1, 8>_mean 3.27 3.27 25000000
```
As for compile-time benchmarking, I also tested it like this:
```
#include <variant>
int main(int argc, char* argv[]) {
std::variant<char, unsigned char, int> v;
v.emplace<0>(3);
int n = 0;
unsigned int r = 1;
#define X(V) \
++n; \
std::visit([&](int x) { r *= x; }, V)
(void)--n, X(v);
#ifdef NEW_VERSION
// clang-format off
X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
// clang-format on
#else
(void)v;
#endif
#undef X
return r % 1000 == 1 ? -1 : n;
}
```
Under `-O3` I got:
- Baseline: only 1 variant call: 5216 bytes
- 64 extra calls (new implementation): 5216 bytes, +0.1 ms
- 64 extra calls (old implementation): 54104 bytes, +0.43 ms
My setup/system is a bit different from last time, so it's not quite 8x here, but still, it's a huge win.
**tl;dr: it's a strict win on every axis I measure.** @philnik777
https://github.com/llvm/llvm-project/pull/164196
More information about the libcxx-commits
mailing list