[llvm-dev] IntrusiveRefCntPtr vs std::shared_ptr

Wed Oct 19 11:14:17 PDT 2016

On Wed, Oct 19, 2016 at 6:24 PM, Benjamin Kramer via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> In terms of performance shared_ptr has a number of disadvantages. One
> is that it always uses atomics even though most IntrusiveRefCntPtrs
> are used in single-threaded contexts. Another is weak_ptr adding a lot
> of complexity to the implementation, IntrusiveRefCntPtr doesn't
> support weak references.
>
> With that it's hard to make a case for changing uses of
> IntrusiveRefCntPtr as it's a non-trivial amount of work
> (IntrusiveRefCntPtr binds the reference count to the object itself,
> shared_ptr doesn't. Figuring out when a value held by an
> IntrusiveRefCntPtr is passed around by raw pointer and stuffed into
> another IntrusiveRefCntPtr is hard) with potential negative
> performance impact.
>

In terms of performance, the whole concept has a number of disavantages :-)

I recently tried an experiment. I compiled a 40000 line C file
(concatenated all the files of a project together) to .bc with clang, and
then ran llc on it. I tried it on both Ubuntu 16.04 x64 and on an Odroid
XU-4 ARM board. with very similar results.

I made a tiny library with a 1 GB static char array. I made a malloc() that
simply bumped a pointer (prepending a 32 bit object size, just for
realloc(), grrrrrr kill it with fire), and a free() that is an empty
function. There's a calloc() that calls the above malloc() and then
memset(). And a realloc() that is a no-op if the size is smaller, or does
malloc(), memcpy() if bigger.

Then I used LD_PRELOAD to replace the standard malloc library with mine.

Result: ~10% faster execution than llc without LD_PRELOAD, and ~180 MB of
the array used (120 MB on the 32 bit ARM).

Then I built BDW GC as a malloc replacement (with free() as a no-op) and
used LD_PRELOAD with it.

Result: ~20% faster execution than llc without LD_PRELOAD, and ~10 MB of
RAM used.

In this experiment all the reference counting in IntrusiveRefCntPtr
or shared_ptr or whatever still takes place, the same as before. But at the
end, when it decides to call free, it's a no-op. So all the
reference-counting machinery is a complete waste of time and code and RAM
and the program would run strictly faster if it was ripped out.

I don't know for sure (it's a lot more work to try!), but I would not be
surprised to see a further 10%-20% speedup.

And then you come to the cognitive load on the programmer, trying to decide
whether to use IntrusiveRefCntPtr or shared_ptr or unique_ptr or auto_ptr
or weak_ptr or whether and where to call free()/delete. And the extra
typing needed to write it instead of using a raw pointer. And the extra
time and cognitive load to read the code. And for what?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/a6bb426f/attachment.html>