[llvm-dev] IntrusiveRefCntPtr vs std::shared_ptr

Bruce Hoult via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 19 12:20:36 PDT 2016


On Wed, Oct 19, 2016 at 10:06 PM, David Jones <djones at xtreme-eda.com> wrote:

> Out of curiosity, when you did these tests, did you compile your test
> program with -O2?
>

The test program, or llvm?

llvm was the standard llvm installed by Ubuntu's package manager. I expect
it's compiled with some nonzero -O level

I think I compiled my test program without optimization. That will change
the amount of work llvm does, but I think not the nature of it.

I'm happy to do more tests.


>
>
I am working on a large codebase that uses intrusive reference counting.
> I'm finding that without -O (e.g. build for easy debug) it's quite slow,
> but with -O2 it's maybe 4x faster. This indicates that optimization can
> clean up a lot of the overhead of reference counting.
>
> Whether or not a GC such as BDWGC can help you depends on how much "memory
> churn" your program does. Again, I've found that my software tends to
> allocate and free quite a lot of memory, so if I used GC, then I'd spend
> 25%+ time doing the GC. On the other hand, if you do more computing than
> allocating, then you may come out ahead - GC has zero overhead when not
> actually allocating, whereas reference-counting can have overhead when
> traversing a data structure even if you malloc/free nothing.
>
> As always, the particular application is what matters, and your mileage
> may vary.
>
>
> On Wed, Oct 19, 2016 at 2:14 PM, Bruce Hoult via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Wed, Oct 19, 2016 at 6:24 PM, Benjamin Kramer via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> In terms of performance shared_ptr has a number of disadvantages. One
>>> is that it always uses atomics even though most IntrusiveRefCntPtrs
>>> are used in single-threaded contexts. Another is weak_ptr adding a lot
>>> of complexity to the implementation, IntrusiveRefCntPtr doesn't
>>> support weak references.
>>>
>>> With that it's hard to make a case for changing uses of
>>> IntrusiveRefCntPtr as it's a non-trivial amount of work
>>> (IntrusiveRefCntPtr binds the reference count to the object itself,
>>> shared_ptr doesn't. Figuring out when a value held by an
>>> IntrusiveRefCntPtr is passed around by raw pointer and stuffed into
>>> another IntrusiveRefCntPtr is hard) with potential negative
>>> performance impact.
>>>
>>
>> In terms of performance, the whole concept has a number of disavantages
>> :-)
>>
>> I recently tried an experiment. I compiled a 40000 line C file
>> (concatenated all the files of a project together) to .bc with clang, and
>> then ran llc on it. I tried it on both Ubuntu 16.04 x64 and on an Odroid
>> XU-4 ARM board. with very similar results.
>>
>> I made a tiny library with a 1 GB static char array. I made a malloc()
>> that simply bumped a pointer (prepending a 32 bit object size, just for
>> realloc(), grrrrrr kill it with fire), and a free() that is an empty
>> function. There's a calloc() that calls the above malloc() and then
>> memset(). And a realloc() that is a no-op if the size is smaller, or does
>> malloc(), memcpy() if bigger.
>>
>> Then I used LD_PRELOAD to replace the standard malloc library with mine.
>>
>> Result: ~10% faster execution than llc without LD_PRELOAD, and ~180 MB of
>> the array used (120 MB on the 32 bit ARM).
>>
>> Then I built BDW GC as a malloc replacement (with free() as a no-op) and
>> used LD_PRELOAD with it.
>>
>> Result: ~20% faster execution than llc without LD_PRELOAD, and ~10 MB of
>> RAM used.
>>
>> In this experiment all the reference counting in IntrusiveRefCntPtr
>> or shared_ptr or whatever still takes place, the same as before. But at the
>> end, when it decides to call free, it's a no-op. So all the
>> reference-counting machinery is a complete waste of time and code and RAM
>> and the program would run strictly faster if it was ripped out.
>>
>> I don't know for sure (it's a lot more work to try!), but I would not be
>> surprised to see a further 10%-20% speedup.
>>
>>
>> And then you come to the cognitive load on the programmer, trying to
>> decide whether to use IntrusiveRefCntPtr or shared_ptr or unique_ptr or
>> auto_ptr or weak_ptr or whether and where to call free()/delete. And the
>> extra typing needed to write it instead of using a raw pointer. And the
>> extra time and cognitive load to read the code. And for what?
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
> believed to be clean.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/92e09398/attachment.html>


More information about the llvm-dev mailing list