[cfe-dev] Constexpr evaluation speed
David Rector via cfe-dev
cfe-dev at lists.llvm.org
Tue Mar 2 16:45:53 PST 2021
I was reading http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf>, which describes proposed reflection features currently under advanced consideration.
On page 5, the authors give their rationale for defining `reflexpr(x)` to return an object of an opaque builtin type called `meta::info`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities. This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.).
In other words given this design the user must write e.g. `meta::parameters_of(reflexpr(somefunc))` instead of e.g. `reflexpr(somefunc)->parameters()`.
One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal. The authors present an example on pp 5-6.
I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.
Example A (NB `f()` deals only with complete objects):
Example B (same as A, except `f()` now defined to dig through subobjects to get the data it needs):
Results of 5 trials:
GCC (1793B): 337, 408, 325, 435, 242 ms
Clang (218B): 9361, 5173, 4066, 5698, 4263 ms
MSVC (306B): 21850, 24616, 24957, 24925, 32323 ms
GCC (2295B): 471, 319, 403, 309, 323 ms
Clang (218B): 17073, 15540, 17281, 13385, 18540 ms
MSVC (n/a): always >60k ms
1. Clang performs constant evaluation 10-50 times slower than GCC.
2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).
1. What am I missing? Are there flags which might improve clang’s performance? Is GCC somehow gaining an unfair advantage? (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?)
2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?
3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?
Not a pressing matter, but maybe worthy of some thought. Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev