[cfe-commits] [Patch 4 of 5] Proposed fix for PR12850

Tue Sep 4 07:40:00 PDT 2012

Attached here are the patches that have changed since last time.  I hope
sending them all in one email won't cause any headaches!

Patch 1 and 4 haven't changed, although patch 4 is posted again to avoid
a needless merge problem.

The changes are as follows:

Part 2: Added support for constexpr member functions (this was missing
        from the last patch for some inexplicable reason!) and cut down
        the number of unnecessary APValue copies.

Part 3: Added a fix for Richard's nasty corner case and cut down the
        number of unnecessary APValue copies, plus slight changes in
        support of changes in patch 2.

Part 5: Additional test-cases: 
         constexpr-cache.cpp -- checking validity of cache
         constexpr-sine.cpp  -- a sine lookup table to stress performance
        Performance stats have also been added to constexpr-nqueens.cpp
        and constexpr-turing.cpp.

A note regarding constexpr-sine.cpp: I've limited the size of the table
so that under a debug build the compile time is not too awful.  Under a
release build, performance is approaching that other well-known compiler
but does not surpass it, so improvements can still be made...

All comments, corrections and challenges welcome! :o)

Andy

On Monday, September 03, 2012 6:41 PM, Andy Gibbs wrote:
> On Saturday, August 25, 2012 3:51 AM, Richard Smith wrote:
>> Hi Andy,
>>
>> Editorializing somewhat, I think it's unfortunate that people expect
>> constexpr evaluation to be asymptotically more efficient than the
>> runtime evaluation of the same functions, and I'm a little concerned
>> that this caching will lead people to write code which has surprising
>> poor performance in the cases where a constexpr function gets invoked
>> dynamically. But... g++ already implements a similar optimization, so
>> we'll probably need this to handle code written for it.
>>
>> Have you measured the impact of this cache on "well-written" constexpr
>> functions (those not requiring memoization for efficiency)? What's the
>> memory usage and performance impact on, say,
>> test/SemaCXX/constexpr-turing.cpp, with a more complex Turing machine?
>> Can you provide some figures for typical and worst-case slowdowns?
> 
> Ok, I've done some performance testing and here are some rough values.
> 
> There are two clang test-cases which are interesting with regard to the
> memoisation feature, namely constexpr-turing and constexpr-nqueens.  I
> imagine that you would consider these to be "well written" -- it seems,
> at any rate, that you are the author, so I am guessing so! :o)
> 
> With constexpr-turing, we have a test-case that is unable to make use
> of the cached evaluations.  In this case, I see a 5% slow-down in
> compilation, but also a moderate additional ~200Kb memory consumption.
> 
> With constexpr-nqueens, we have a much more positive result in terms
> of a 19% speed-up in compilation, but at a cost of ~13Mb of additional
> memory.
> 
> In my own code, I have seen anything in the range of 2% slow-down up to
> 25% speed-up -- this is, of course, only for "well written" constexpr
> functions since I get a 2000% to 5000% speed-up for the rest, at least
> for those I was willing to test without caching (the others being too
> slow)!
> 
> In terms of additional compiler memory, I've seen up to 100Mb increase
> but this is really down to constexpr algorithms in use.
> 
> As a comparison, constexpr-torture consumes an additional ~10Mb, but
> finishes in ~1.5 seconds as opposed to upwards of 3 hours (the point
> where I terminated it).
> 
> We could still take the route of an attribute/f-switch pair to
> override the behaviour of the cache.  My suggestion, if you feel the
> performance changes are too costly, would be a -fconstexpr-cache on/
> off switch for global use and a __attribute__((constexpr-cache on/
> off)) for local use.
> 
>> There is a nasty corner case where the value of a global changes
>> between successive calls to a constexpr function (this happens if a
>> variable is read after it's zero-initialized but before it's
>> dynamically initialized, from a constexpr function called within its
>> own initializer). Your patch doesn't seem to handle that (though
>> perhaps I just missed it).
> 
> I fixed the code to support your corner case.  I'm still a little bit
> surprised that it isn't outlawed, but I've seen your other posts on the
> net talking about what's permitted and not, so I'm more than willing to
> accept your better understanding.  For myself, I'll class it as "don't
> try this at home"!
> 
>> Have you considered using llvm::FoldingSet for the cache? That would
>> be particularly useful if we ever want to allow arbitrary literal
>> types as non-type template arguments, or other such cases which might
>> require reducing an APValue to a FoldingSetNodeID.
> 
> I believe this means supporting FoldingSetTrait for APValues, which is
> not currently done -- I don't have the time to do this at present.  Can
> it be left to another day?
> 
>> There seems to be a fair amount of APValue copying going on in this
>> change. For large APValues, that's not cheap. Perhaps you could
>> allocate the cache entry up-front, and evaluate the function arguments
>> and return value directly into its APValue slots, or swap them in?
> 
> Done -- this took the constexpr-nqueens test-case from ~11% to ~19%
> speed-up, for example.
> 
>> Does this have an impact on the constexpr recursion limit? For instance,
>> if f(500) recurses to depth 500, and g(500) recurses to depth 500 then
>> calls f(500), will the result of calling g(500) depend on whether I've
>> called f(500) already in this translation unit? That seems like a highly
>> unfortunate property, but it's consistent with how template
>> instantiations behave at least.
> 
> Yes, there is an impact on recursion limit, but not in a negative way
> (in my opinion!), meaning that if f(500) is already cached, then calling
> g(500) doesn't run to a recursion depth of 1000, only 501, since f(500)
> is picked out of the cache and not recursively evaluated.
> 
>> Thanks for looking into this!
> 
> No problems.  I'll try and split out the revised patches and post them
> shortly...  Any further things before I do so?
> 
> Cheers
> Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part5-new.diff
Type: application/octet-stream
Size: 27419 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120904/699f412d/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part4-new.diff
Type: application/octet-stream
Size: 5892 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120904/699f412d/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part3-new.diff
Type: application/octet-stream
Size: 8163 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120904/699f412d/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part2-new.diff
Type: application/octet-stream
Size: 28387 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120904/699f412d/attachment-0003.obj>