<div dir="ltr"><div dir="ltr">On Tue, 2 Mar 2021 at 16:46, David Rector via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div style="margin:0px;font-stretch:normal;line-height:normal">I was reading <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf" target="_blank">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1240r1.pdf</a>, which describes proposed reflection features currently under advanced consideration. </div><div style="margin:0px;font-stretch:normal;line-height:normal;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">On page 5, the authors give their rationale for defining `<span style="font-stretch:normal;line-height:normal;font-family:Menlo">reflexpr(x)</span>` to return an object of an opaque builtin type called `<span style="font-stretch:normal;line-height:normal;font-family:Menlo">meta::info</span>`, which is useful only when passed to other builtin functions able to access its various properties for different reflected entities. This design is favored over the alternative of defining it to return an AST-class-like object specific to each reflected entity (ReflectedDeclaration, ReflectedStatement etc.). </div><div style="margin:0px;font-stretch:normal;line-height:normal;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">In other words given this design the user must write e.g. `<span style="font-stretch:normal;line-height:normal;font-family:Menlo">meta::parameters_of(reflexpr(somefunc))</span>` instead of e.g. `<span style="font-stretch:normal;line-height:normal;font-family:Menlo">reflexpr(somefunc)->parameters()</span>`.</div><div style="margin:0px;font-stretch:normal;line-height:normal;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">One rationale the authors give for this choice is that they found that accessing subobjects of a constexpr class object is significantly slower than accessing values which are not subobjects, all else being equal. The authors present an example on pp 5-6. </div><div style="margin:0px;font-stretch:normal;line-height:normal;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">I tried to reproduce this example and their results in CompilerExplorer, and was mostly just shocked at the apparent orders-of-magnitude-differences between GCC (by far the fastest), Clang, and MSVC (by far the slowest) at both constant-evaluation tasks.</div><div style="margin:0px;font-stretch:normal;line-height:normal;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">Example A (NB `<span style="font-stretch:normal;line-height:normal;font-family:Menlo">f()</span>` deals only with complete objects):</div><div style="margin:0px;font-stretch:normal;line-height:normal"><a href="https://godbolt.org/z/TM5Wb6" target="_blank">https://godbolt.org/z/TM5Wb6</a></div><div style="margin:0px;font-stretch:normal;line-height:normal;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">Example B (same as A, except `<span style="font-stretch:normal;line-height:normal;font-family:Menlo">f()</span>` now defined to dig through subobjects to get the data it needs):</div><div style="margin:0px;font-stretch:normal;line-height:normal"><a href="https://godbolt.org/z/5dPq3W" target="_blank">https://godbolt.org/z/5dPq3W</a></div><div style="margin:0px;font-stretch:normal;line-height:normal;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">Results of 5 trials:</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo"> __Compilation_times__</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo">A</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo">GCC (1793B): 337, 408, 325, 435, 242 ms</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo">Clang (218B): 9361, 5173, 4066, 5698, 4263 ms</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo">MSVC (306B): 21850, 24616, 24957, 24925, 32323 ms</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo">B</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo">GCC (2295B): 471, 319, 403, 309, 323 ms</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo">Clang (218B): 17073, 15540, 17281, 13385, 18540 ms</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo">MSVC (n/a): always >60k ms</div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">Takeaways:</div><div style="margin:0px;font-stretch:normal;line-height:normal">1. Clang performs constant evaluation 10-50 times slower than GCC.</div><div style="margin:0px;font-stretch:normal;line-height:normal">2. While Clang performs B ~3 times slower than it performs A, it is not clear that GCC is likewise affected by having to dig through subobjects (if it is, the effect is slight).</div><div style="margin:0px;font-stretch:normal;line-height:normal;min-height:14px"><br></div><div style="margin:0px;font-stretch:normal;line-height:normal">Questions:</div><div style="margin:0px;font-stretch:normal;line-height:normal">1. What am I missing? Are there flags which might improve clang’s performance? Is GCC somehow gaining an unfair advantage? (Potential clue: note that the executable is the same small size, 218 bytes, for each of clang’s results, but it is larger and differs for GCC’s results…meaningful?) </div></div></blockquote><div><br></div><div>Yes, GCC is "cheating" (you're not testing what you think you are). GCC memoizes constant evaluations (at least when it's correct to do so). Here's a slightly modified version of your A that doesn't permit memoization: <a href="https://godbolt.org/z/3q3TYr">https://godbolt.org/z/3q3TYr</a></div><div><br></div><div>5 trials with that and GCC (1793B): 12340, 11106, 10204, 9983, 10771ms</div><div><br></div><div>(Times for Clang and MSVC seem similar to your measurements.) I don't know if compiler explorer uses the same machines for all compiles, or if all compilers are built in fully-optimized mode, but if so, that suggests that Clang is about 2x faster than GCC for this particular testcase.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div style="margin:0px;font-stretch:normal;line-height:normal">2. Given the "constexpr all the things" zeitgeist, and the constant evaluation speeds GCC has apparently realized, should the design of ExprConstant.cpp/APValue/etc. be reconsidered?</div></div></blockquote><div><br></div><div>Ignoring the part about GCC, yes. We have a -fexperimental-new-constant-interpreter flag that enables a new interpreter, which was built to be substantially faster. Unfortunately it's not complete yet (the version in trunk doesn't support any looping constructs yet) but the early indications are very promising.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div style="margin:0px;font-stretch:normal;line-height:normal">3. If ExprConstant.cpp etc were overhauled for optimal speed, will it still ultimately be true that programs which dig through subobjects of compile time objects are necessarily slower than equivalent programs which deal only with complete compile-time objects?</div></div></blockquote><div><br></div><div>I don't think that would necessarily be the case. I've also mentioned that in committee, but "one compiler can do X" doesn't necessarily translate into "everyone will do X", and folks representing other compilers have indicated they expect the more "bare-bones" approach will remain faster for their implementations.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div>Not a pressing matter, but maybe worthy of some thought. Thanks,</div><div><br></div><div>Dave</div></div>_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div></div>