<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">One more vein of thought re constexpr evaluation speed, the new constexpr interpreter, and in particular how the C++ standard could be tweaked to allow us to use existing LLVM optimizations to really turbocharge code with heavy constexpr usage.</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Caveat: my knowledge is mostly about the AST; I know very little about the ABI/calling conventions/LLVM etc, which are implicated here.</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Motivation: it seems to me that constexpr evaluations will get very complex in the future, and we should plan for it. In particular, I have been fighting a bit of a battle on the SG7 list to ensure the reflection + injection facilities will be sufficiently general to allow most design patterns to be rendered obsolete via metafunctions, and I think we came to a rough agreement that this is indeed a worthy and viable goal. </div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">But this will involve some very complex metaprogramming. Indeed, constexpr programming may well become more complex than non-constexpr programming once reflection + injection get involved, and in fact *that would be the ideal*. Let the user automate the tasks which make C++ programming complex, but via customizable libraries from which they can pick and choose and modify, rather than endless fixed appendages to the language.</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">But as users constexpr all the things, so must compilers consider how best to constexpr all their optimizations. Why rewrite optimizations specific to constexpr functions which have already been written into LLVM?</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">The RFC for the new constexpr intepreter has some good discussion, and seems to present an opportunity. In particular I’m looking at the comments here: </div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><a href="https://lists.llvm.org/pipermail/cfe-dev/2019-July/062807.html" class="">https://lists.llvm.org/pipermail/cfe-dev/2019-July/062807.html</a>, and in particular this response to Richard from Nandor:</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: Courier;" class=""><span style="font-kerning: none" class="">> ><i class=""> * The current APValue representation is extremely bloated. Each instance is 72 bytes, and every subobject of an object is stored as a distinct APValue, so for instance a single char[128] variable will often occupy 9288 bytes of storage and incurs 128 distinct memory allocations.</i></span></div><div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: Courier;" class=""><span style="font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;" class="">> </span><span style="font-kerning: none" class="">Arrays and structures will be stored in a compact, contiguous form in memory, so we can save a lot of space here.</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Suppose we go a short step further and store that data in compliance with the ABI just like it were run-time data (with the proper alignment, offsets etc of subobjects). </div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Then, suppose this were permissible code:</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">```</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">// foo.h</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">struct A { constexpr A() {} … };</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">constexpr int foo(A a);</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><font face="Menlo" class=""><br class=""></font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">// foo.cpp</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">constexpr int foo(A a) { </font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class=""> // Complicated functions, lots of loops etc. which</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class=""> // can be well-optimized by LLVM…</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">}</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><font face="Menlo" class=""><br class=""></font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">// main.cpp (must be compiled AFTER foo.cpp, or error)</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">#include "foo.h"</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">template<int N> class Dummy {};</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">int main() {</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class=""> constexpr A a{};</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class=""> Dummy<foo(a)> d;</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><font face="Menlo" class="">}</font></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">```</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Upon being called to evaluate `foo(a)`, the interpreter would determine that this function has already been fully compiled into binary, *and* was marked constexpr (so we know there is no funny business in the definition), and therefore it can simply call that function directly, in whatever manner LLVM would call such a function, i.e. passing the raw data in accordance with the ABI.</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">This would seemingly allow us to benefit from optimizations written only for lowered versions of that code, for particularly-complicated constexpr code that warrant it — the user need only a) put the definitions in separate cpps/libraries, and b) ensure they are built before any translation units which depend on them. For trivial constexpr evaluation, nothing need change.</div><div style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Thoughts? Is this remotely viable/sensible?</div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><br class=""></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class="">Dave</div><div><br class=""><blockquote type="cite" class=""><div class="">On Mar 2, 2021, at 11:28 PM, David Rector <<a href="mailto:davrecthreads@gmail.com" class="">davrecthreads@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class="Apple-interchange-newline"><br class=""><blockquote type="cite" class=""><div class="">On Mar 2, 2021, at 10:03 PM, Richard Smith <<a href="mailto:richard@metafoo.co.uk" class="">richard@metafoo.co.uk</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div dir="ltr" class="">On Tue, 2 Mar 2021 at 16:46, David Rector via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div class="" style="overflow-wrap: break-word;"><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">I was reading <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf" target="_blank" class="">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf</a>, which describes proposed reflection features currently under advanced consideration. </div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">On page 5, the authors give their rationale for defining `<span class="" style="font-stretch: normal; line-height: normal; font-family: Menlo;">reflexpr(x)</span>` to return an object of an opaque builtin type called `<span class="" style="font-stretch: normal; line-height: normal; font-family: Menlo;">meta::info</span>`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities. This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.). </div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">In other words given this design the user must write e.g. `<span class="" style="font-stretch: normal; line-height: normal; font-family: Menlo;">meta::parameters_of(reflexpr(somefunc))</span>` instead of e.g. `<span class="" style="font-stretch: normal; line-height: normal; font-family: Menlo;">reflexpr(somefunc)->parameters()</span>`.</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal. The authors present an example on pp 5-6. </div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">Example A (NB `<span class="" style="font-stretch: normal; line-height: normal; font-family: Menlo;">f()</span>` deals only with complete objects):</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;"><a href="https://godbolt.org/z/TM5Wb6" target="_blank" class="">https://godbolt.org/z/TM5Wb6</a></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">Example B (same as A, except `<span class="" style="font-stretch: normal; line-height: normal; font-family: Menlo;">f()</span>` now defined to dig through subobjects to get the data it needs):</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;"><a href="https://godbolt.org/z/5dPq3W" target="_blank" class="">https://godbolt.org/z/5dPq3W</a></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">Results of 5 trials:</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;"> <span class="Apple-converted-space"> </span>__Compilation_times__</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;">A</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;">GCC (1793B): 337, 408, 325, 435, 242 ms</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;">Clang (218B): 9361, 5173, 4066, 5698, 4263 ms</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;">MSVC (306B): 21850, 24616, 24957, 24925, 32323 ms</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;">B</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;">GCC (2295B): 471, 319, 403, 309, 323 ms</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;">Clang (218B): 17073, 15540, 17281, 13385, 18540 ms</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo;">MSVC (n/a): always >60k ms</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">Takeaways:</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">1. Clang performs constant evaluation 10-50 times slower than GCC.</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal; min-height: 14px;"><br class=""></div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">Questions:</div><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">1. What am I missing? Are there flags which might improve clang’s performance? Is GCC somehow gaining an unfair advantage? (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?) </div></div></blockquote><div class=""><br class=""></div><div class="">Yes, GCC is "cheating" (you're not testing what you think you are). GCC memoizes constant evaluations (at least when it's correct to do so). Here's a slightly modified version of your A that doesn't permit memoization: <a href="https://godbolt.org/z/3q3TYr" class="">https://godbolt.org/z/3q3TYr</a></div><div class=""><br class=""></div><div class="">5 trials with that and GCC (1793B): 12340, 11106, 10204, 9983, 10771ms</div><div class=""><br class=""></div><div class="">(Times for Clang and MSVC seem similar to your measurements.) I don't know if compiler explorer uses the same machines for all compiles, or if all compilers are built in fully-optimized mode, but if so, that suggests that Clang is about 2x faster than GCC for this particular testcase.</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div class="" style="overflow-wrap: break-word;"><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?</div></div></blockquote><div class=""><br class=""></div><div class="">Ignoring the part about GCC, yes. We have a -fexperimental-new-constant-interpreter flag that enables a new interpreter, which was built to be substantially faster. Unfortunately it's not complete yet (the version in trunk doesn't support any looping constructs yet) but the early indications are very promising.</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div class="" style="overflow-wrap: break-word;"><div class="" style="margin: 0px; font-stretch: normal; line-height: normal;">3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?</div></div></blockquote><div class=""><br class=""></div><div class="">I don't think that would necessarily be the case. I've also mentioned that in committee, but "one compiler can do X" doesn't necessarily translate into "everyone will do X", and folks representing other compilers have indicated they expect the more "bare-bones" approach will remain faster for their implementations.</div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div class="" style="overflow-wrap: break-word;"><div class=""><br class=""></div></div></blockquote></div></div></div></blockquote><br class=""></div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Thanks Richard, very thorough answer; that adjustment does indeed turn the tables on GCC. Clang takes the gold, and that’s before even the fancy new interpreter is deployed. Looking forward to it.</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""></div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Assuming like machines are running the various compilers on CompilerExplorer, I think we can all hazard a pretty good guess at this point which particular compiler really benefits from keeping things as bare bones as possible ;)</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""></div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Dave</div></div></blockquote></div><br class=""></body></html>