[llvm-dev] IntrusiveRefCntPtr vs std::shared_ptr

Wed Oct 19 11:31:26 PDT 2016

> On Oct 19, 2016, at 11:14 AM, Bruce Hoult via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On Wed, Oct 19, 2016 at 6:24 PM, Benjamin Kramer via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> In terms of performance shared_ptr has a number of disadvantages. One
> is that it always uses atomics even though most IntrusiveRefCntPtrs
> are used in single-threaded contexts. Another is weak_ptr adding a lot
> of complexity to the implementation, IntrusiveRefCntPtr doesn't
> support weak references.
> 
> With that it's hard to make a case for changing uses of
> IntrusiveRefCntPtr as it's a non-trivial amount of work
> (IntrusiveRefCntPtr binds the reference count to the object itself,
> shared_ptr doesn't. Figuring out when a value held by an
> IntrusiveRefCntPtr is passed around by raw pointer and stuffed into
> another IntrusiveRefCntPtr is hard) with potential negative
> performance impact.
>  
> In terms of performance, the whole concept has a number of disavantages :-)
> 
> I recently tried an experiment. I compiled a 40000 line C file (concatenated all the files of a project together) to .bc with clang, and then ran llc on it. I tried it on both Ubuntu 16.04 x64 and on an Odroid XU-4 ARM board. with very similar results.
> 
> I made a tiny library with a 1 GB static char array. I made a malloc() that simply bumped a pointer (prepending a 32 bit object size, just for realloc(), grrrrrr kill it with fire), and a free() that is an empty function. There's a calloc() that calls the above malloc() and then memset(). And a realloc() that is a no-op if the size is smaller, or does malloc(), memcpy() if bigger.
> 
> Then I used LD_PRELOAD to replace the standard malloc library with mine.
> 
> Result: ~10% faster execution than llc without LD_PRELOAD, and ~180 MB of the array used (120 MB on the 32 bit ARM).
> 
> Then I built BDW GC as a malloc replacement (with free() as a no-op) and used LD_PRELOAD with it.
> 
> Result: ~20% faster execution than llc without LD_PRELOAD, and ~10 MB of RAM used.
> 
> In this experiment all the reference counting in IntrusiveRefCntPtr or shared_ptr or whatever still takes place, the same as before. But at the end, when it decides to call free, it's a no-op. So all the reference-counting machinery is a complete waste of time and code and RAM and the program would run strictly faster if it was ripped out.

I may miss something in your description, but it seems like you’re never releasing memory? I’m not sure I follow how is it a good thing? 
Also what about destructor?

— 
Mehdi

> 
> I don't know for sure (it's a lot more work to try!), but I would not be surprised to see a further 10%-20% speedup.
> 
> 
> And then you come to the cognitive load on the programmer, trying to decide whether to use IntrusiveRefCntPtr or shared_ptr or unique_ptr or auto_ptr or weak_ptr or whether and where to call free()/delete. And the extra typing needed to write it instead of using a raw pointer. And the extra time and cognitive load to read the code. And for what? 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/de8ff93a/attachment.html>