[cfe-dev] std::function is inefficient
David Krauss
potswa at gmail.com
Thu May 21 09:01:09 PDT 2015
The nonstatic members of std::function are declared like this:
typename aligned_storage<3*sizeof(void*)>::type __buf_;
__base* __f_;
sizeof(std::function) is apparently supposed to be 16 bytes on 32-bit systems and 32 bytes on 64-bit. On my system, it's 48 bytes, because the maximum alignment is 16 bytes, __buf_ takes 32 bytes, and there are 8 padding bytes after the pointer.
As for time complexity, following a __base* to get to a vtable is unnecessary in the first place.
Here's an outline of an architecture that I think would be faster, slimmer, and better. It has better locality because calls can be dispatched without accessing the object on the heap at all. There are fewer memory accesses total. The number of indirect branches remains the same. One less pointer gets stored, FWIW.
It does presume that the abstract base pointer and the derived pointers are identical, but then so does the current implementation (see __func::__clone(__base*)). It does use C++ polymorphic classes, and I’m not sure whether that’s really kosher (see my previous message here).
template< typename ret, typename ... args >
struct __base {
virtual ret call( args && ... ) = 0;
virtual void __clone(void *) const = 0;
// etc.
~base();
};
template< typename ret, typename ... args >
struct __empty_func
: base< ret, args ... > {
virtual ret call( args && ... )
{ throw bad_function_call(); }
virtual void __clone( void * p ) const
{ new( p ) __empty_func; }
// etc.
};
template< typename fp, typename alloc, typename ret, typename ... args >
struct __small_func
: base< ret, args ... > {
__compressed_pair< fp, alloc > __f_;
virtual ret call( args && ... a )
{ return __f_( forward< args >( a ) ... ); }
virtual void __clone( void * p ) const
{ new( p ) __small_func(__f_); } // allocator only used when assigning a new type
// (Note: Current implementation does not support allocator erasure.)
// etc.
};
template< typename fp, typename alloc, typename ret, typename ... args >
struct __big_func
: base< ret, args ... > {
fp *__f_;
alloc __a_;
virtual ret call( args && ... a )
{ return (*__f_)( forward< args >( a ) ... ); }
virtual void __clone( void * p ) const
{ new( p ) __small_func( new fp(__f_) ); } // rather, use the allocator.
// etc.
};
template< typename ret, typename ... args >
struct function< ret( args ... ) {
aligned_storage< 4 * sizeof(void*) >::type __buf_;
function() {
new(__buf_) __empty_func<ret, args...>;
}
operator bool () const {
return typeid(__empty_func<ret, args...>)
== typeid(* static_cast<__base<ret, args...>*>(__buf_));
}
void __clear() {
static_cast<__base<ret, args...>*>(__buf_).~__base();
new(__buf_) __empty_func<ret, args...>; // do this as a scope guard
}
template< typename target >
enable_if_t< sizeof(__small_func<target>) <= sizeof __buf_,
function & >
operator = ( target const & o ) {
try {
new(__buf_) __small_func<target, ret, args...>( o );
} catch (...) {
__clear();
throw;
}
}
~ function()
{ __clear(); }
};
Let me know what you think…
- Thanks,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150522/4ded2d69/attachment.html>
More information about the cfe-dev
mailing list