[libcxx] r185343 - Implement n3658 - Compile-time integer sequences

Tue Jul 2 07:56:43 PDT 2013

On Jul 1, 2013, at 10:13 PM, Richard Smith <richard at metafoo.co.uk> wrote:

> On Mon, Jul 1, 2013 at 4:57 PM, Howard Hinnant <hhinnant at apple.com> wrote:
>> On Jul 1, 2013, at 6:17 PM, Howard Hinnant <hhinnant at apple.com> wrote:
>> 
>>> Inspired by your approach, I started from scratch and came up with the enclosed.  The compiler will run out of memory before it runs into the recursive template instantiation limit, and this works with far higher numbers.  It passes all the current intseq tests.
>> 
>> I did some timing tests for the compiler using:
>> 
>> #include <utility>
>> #include <iostream>
>> 
>> int
>> main()
>> {
>>    typedef std::make_index_sequence<10000> T;
>>    std::cout << typeid(T).name() << "\n";
>> }
>> 
>> and:
>> 
>> #include <utility>
>> #include <iostream>
>> 
>> int
>> main()
>> {
>>    typedef std::make_index_sequence<100000> T;
>>    std::cout << typeid(T).name() << "\n";
>> }
>> 
>> These are extremely long sequences!  I discovered that if I add to the patch enclosed in my previous email:
>> 
>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 4>
>> {
>>    typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2, _Sp+3> type;
>> };
>> 
>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 3>
>> {
>>    typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2> type;
>> };
>> 
>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 2>
>> {
>>    typedef integer_sequence<_Tp, _Sp, _Sp+1> type;
>> };
>> 
>> that there is between a 50% to 100% improvement in compile time, and lower compiler memory usage (unmeasured), with no other difference in behavior.  There seems to be a sweet spot somewhere in this neighborhood (3 <= _Diff <= 8) between complexity, added benefit, and even performance with lower numbers.  I rather arbitrarily selected 4 as the optimum.
>> 
>> However I'm unsure how this compares to Richard's algorithm as I never was able to get that working.  Perhaps though if someone does, we could make an informed comparison and choose the best.  Regardless this is much better than the algorithm Marshall first posted, which I was responsible for.  Its genesis can be found in <__tuple>'s __make_tuple_indices.  When we get make_integer_sequence settled, I plan on updating <__tuple> accordingly.
> 
> My algorithm is fixed up and attached. I upped the divisor to 8; that
> was the sweet spot in my testing. I also switched to performing the
> pack building in size_t and converting to the sequence's type at the
> end (and thus using explicit specializations rather than partial
> specializations); that made the size_t case about 35% faster, but the
> non-size_t cases about 40% slower (converting from
> integer_sequence<size_t, ...> to integer_sequence<T, ...> is now,
> remarkably, slower than building the integer_sequence<size_t, ...>).
> It's probably possible to tweak this to get the same performance for
> all T, but I've not yet found a trick for that.
> <mis.cc>

I copy/pasted this in and ran the unit tests and one of the tests caused clang to consume over a Gb of memory before I aborted the test.  I guessed that the test responsible for this was inputing a negative N.  So I renamed your __make_integer_sequence to __make_integer_sequence_unchecked, and put checking on top of it and tested again.  This is much faster than what I posted for very long sequences.  Looks like 10X faster.

I went through and macro-protected your suggestion and am posting it back here for review.

Howard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: integer_sequence.patch
Type: application/octet-stream
Size: 4851 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130702/eb682bee/attachment.obj>