[cfe-commits] [Patch 4 of 5] Proposed fix for PR12850
Andy Gibbs
andyg1001 at hotmail.co.uk
Tue Sep 4 07:40:00 PDT 2012
Attached here are the patches that have changed since last time. I hope
sending them all in one email won't cause any headaches!
Patch 1 and 4 haven't changed, although patch 4 is posted again to avoid
a needless merge problem.
The changes are as follows:
Part 2: Added support for constexpr member functions (this was missing
from the last patch for some inexplicable reason!) and cut down
the number of unnecessary APValue copies.
Part 3: Added a fix for Richard's nasty corner case and cut down the
number of unnecessary APValue copies, plus slight changes in
support of changes in patch 2.
Part 5: Additional test-cases:
constexpr-cache.cpp -- checking validity of cache
constexpr-sine.cpp -- a sine lookup table to stress performance
Performance stats have also been added to constexpr-nqueens.cpp
and constexpr-turing.cpp.
A note regarding constexpr-sine.cpp: I've limited the size of the table
so that under a debug build the compile time is not too awful. Under a
release build, performance is approaching that other well-known compiler
but does not surpass it, so improvements can still be made...
All comments, corrections and challenges welcome! :o)
Andy
On Monday, September 03, 2012 6:41 PM, Andy Gibbs wrote:
> On Saturday, August 25, 2012 3:51 AM, Richard Smith wrote:
>> Hi Andy,
>>
>> Editorializing somewhat, I think it's unfortunate that people expect
>> constexpr evaluation to be asymptotically more efficient than the
>> runtime evaluation of the same functions, and I'm a little concerned
>> that this caching will lead people to write code which has surprising
>> poor performance in the cases where a constexpr function gets invoked
>> dynamically. But... g++ already implements a similar optimization, so
>> we'll probably need this to handle code written for it.
>>
>> Have you measured the impact of this cache on "well-written" constexpr
>> functions (those not requiring memoization for efficiency)? What's the
>> memory usage and performance impact on, say,
>> test/SemaCXX/constexpr-turing.cpp, with a more complex Turing machine?
>> Can you provide some figures for typical and worst-case slowdowns?
>
> Ok, I've done some performance testing and here are some rough values.
>
> There are two clang test-cases which are interesting with regard to the
> memoisation feature, namely constexpr-turing and constexpr-nqueens. I
> imagine that you would consider these to be "well written" -- it seems,
> at any rate, that you are the author, so I am guessing so! :o)
>
> With constexpr-turing, we have a test-case that is unable to make use
> of the cached evaluations. In this case, I see a 5% slow-down in
> compilation, but also a moderate additional ~200Kb memory consumption.
>
> With constexpr-nqueens, we have a much more positive result in terms
> of a 19% speed-up in compilation, but at a cost of ~13Mb of additional
> memory.
>
> In my own code, I have seen anything in the range of 2% slow-down up to
> 25% speed-up -- this is, of course, only for "well written" constexpr
> functions since I get a 2000% to 5000% speed-up for the rest, at least
> for those I was willing to test without caching (the others being too
> slow)!
>
> In terms of additional compiler memory, I've seen up to 100Mb increase
> but this is really down to constexpr algorithms in use.
>
> As a comparison, constexpr-torture consumes an additional ~10Mb, but
> finishes in ~1.5 seconds as opposed to upwards of 3 hours (the point
> where I terminated it).
>
> We could still take the route of an attribute/f-switch pair to
> override the behaviour of the cache. My suggestion, if you feel the
> performance changes are too costly, would be a -fconstexpr-cache on/
> off switch for global use and a __attribute__((constexpr-cache on/
> off)) for local use.
>
>> There is a nasty corner case where the value of a global changes
>> between successive calls to a constexpr function (this happens if a
>> variable is read after it's zero-initialized but before it's
>> dynamically initialized, from a constexpr function called within its
>> own initializer). Your patch doesn't seem to handle that (though
>> perhaps I just missed it).
>
> I fixed the code to support your corner case. I'm still a little bit
> surprised that it isn't outlawed, but I've seen your other posts on the
> net talking about what's permitted and not, so I'm more than willing to
> accept your better understanding. For myself, I'll class it as "don't
> try this at home"!
>
>> Have you considered using llvm::FoldingSet for the cache? That would
>> be particularly useful if we ever want to allow arbitrary literal
>> types as non-type template arguments, or other such cases which might
>> require reducing an APValue to a FoldingSetNodeID.
>
> I believe this means supporting FoldingSetTrait for APValues, which is
> not currently done -- I don't have the time to do this at present. Can
> it be left to another day?
>
>> There seems to be a fair amount of APValue copying going on in this
>> change. For large APValues, that's not cheap. Perhaps you could
>> allocate the cache entry up-front, and evaluate the function arguments
>> and return value directly into its APValue slots, or swap them in?
>
> Done -- this took the constexpr-nqueens test-case from ~11% to ~19%
> speed-up, for example.
>
>> Does this have an impact on the constexpr recursion limit? For instance,
>> if f(500) recurses to depth 500, and g(500) recurses to depth 500 then
>> calls f(500), will the result of calling g(500) depend on whether I've
>> called f(500) already in this translation unit? That seems like a highly
>> unfortunate property, but it's consistent with how template
>> instantiations behave at least.
>
> Yes, there is an impact on recursion limit, but not in a negative way
> (in my opinion!), meaning that if f(500) is already cached, then calling
> g(500) doesn't run to a recursion depth of 1000, only 501, since f(500)
> is picked out of the cache and not recursively evaluated.
>
>> Thanks for looking into this!
>
> No problems. I'll try and split out the revised patches and post them
> shortly... Any further things before I do so?
>
> Cheers
> Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part5-new.diff
Type: application/octet-stream
Size: 27419 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120904/699f412d/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part4-new.diff
Type: application/octet-stream
Size: 5892 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120904/699f412d/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part3-new.diff
Type: application/octet-stream
Size: 8163 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120904/699f412d/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part2-new.diff
Type: application/octet-stream
Size: 28387 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120904/699f412d/attachment-0003.obj>
More information about the cfe-commits
mailing list