[cfe-dev] Constexpr evaluation speed

Tue Mar 2 19:03:22 PST 2021

On Tue, 2 Mar 2021 at 16:46, David Rector via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> I was reading
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf,
> which describes proposed reflection features currently under advanced
> consideration.
>
> On page 5, the authors give their rationale for defining `reflexpr(x)` to
> return an object of an opaque builtin type called `meta::info`, which is
> useful only when passed to other builtin functions able to access its
> various properties for different reflected entities.  This design is
> favored over the alternative of defining it to return an AST-class-like
> object specific to each reflected entity (ReflectedDeclaration,
> ReflectedStatement etc.).
>
> In other words given this design the user must write e.g. `
> meta::parameters_of(reflexpr(somefunc))` instead of e.g. `
> reflexpr(somefunc)->parameters()`.
>
> One rationale the authors give for this choice is that they found that
> accessing subobjects of a constexpr class object is significantly slower
> than accessing values which are not subobjects, all else being equal.  The
> authors present an example on pp 5-6.
>
> I tried to reproduce this example and their results in CompilerExplorer,
> and was mostly just shocked at the apparent orders-of-magnitude-differences
> between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at
> both constant-evaluation tasks.
>
> Example A (NB `f()` deals only with complete objects):
> https://godbolt.org/z/TM5Wb6
>
> Example B (same as A, except `f()` now defined to dig through subobjects
> to get the data it needs):
> https://godbolt.org/z/5dPq3W
>
> Results of 5 trials:
>                     __Compilation_times__
> A
> GCC  (1793B):   337,   408,   325,   435,   242 ms
> Clang (218B):  9361,  5173,  4066,  5698,  4263 ms
> MSVC  (306B): 21850, 24616, 24957, 24925, 32323 ms
>
> B
> GCC  (2295B):   471,   319,   403,   309,   323 ms
> Clang (218B): 17073, 15540, 17281, 13385, 18540 ms
> MSVC   (n/a): always >60k ms
>
> Takeaways:
> 1. Clang performs constant evaluation 10-50 times slower than GCC.
> 2. While Clang performs B ~3 times slower than it performs A, it is not
> clear that GCC is likewise affected by having to dig through subobjects (if
> it is, the effect is slight).
>
> Questions:
> 1. What am I missing?  Are there flags which might improve clang’s
> performance?  Is GCC somehow gaining an unfair advantage?  (Potential clue:
> note that the executable is the same small size, 218 bytes, for each of
> clang’s results, but it is larger and differs for GCC’s
> results…meaningful?)
>

Yes, GCC is "cheating" (you're not testing what you think you are). GCC
memoizes constant evaluations (at least when it's correct to do so). Here's
a slightly modified version of your A that doesn't permit memoization:
https://godbolt.org/z/3q3TYr

5 trials with that and GCC (1793B):  12340, 11106, 10204, 9983, 10771ms

(Times for Clang and MSVC seem similar to your measurements.) I don't know
if compiler explorer uses the same machines for all compiles, or if all
compilers are built in fully-optimized mode, but if so, that suggests that
Clang is about 2x faster than GCC for this particular testcase.

2. Given the "constexpr all the things" zeitgeist, and the constant
> evaluation speeds GCC has apparently realized, should the design of
> ExprConstant.cpp/APValue/etc. be reconsidered?
>

Ignoring the part about GCC, yes. We have a
-fexperimental-new-constant-interpreter flag that enables a new
interpreter, which was built to be substantially faster. Unfortunately it's
not complete yet (the version in trunk doesn't support any looping
constructs yet) but the early indications are very promising.

3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still
> ultimately be true that programs which dig through subobjects of compile
> time objects are necessarily slower than equivalent programs which deal
> only with complete compile-time objects?
>

I don't think that would necessarily be the case. I've also mentioned that
in committee, but "one compiler can do X" doesn't necessarily translate
into "everyone will do X", and folks representing other compilers have
indicated they expect the more "bare-bones" approach will remain faster for
their implementations.

Not a pressing matter, but maybe worthy of some thought.  Thanks,
>
> Dave
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210302/699c1dc9/attachment-0001.html>