[llvm-dev] IntrusiveRefCntPtr vs std::shared_ptr

Wed Oct 19 12:21:56 PDT 2016

Locality of reference (largely fitting into L3 cache), and not having to
produce a large number of demand-zero CoW VM pages from the OS.

On Wed, Oct 19, 2016 at 10:19 PM, Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
> On Oct 19, 2016, at 12:16 PM, Bruce Hoult <bruce at hoult.org> wrote:
>
> On Wed, Oct 19, 2016 at 9:31 PM, Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> On Oct 19, 2016, at 11:14 AM, Bruce Hoult via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> On Wed, Oct 19, 2016 at 6:24 PM, Benjamin Kramer via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> In terms of performance shared_ptr has a number of disadvantages. One
>>> is that it always uses atomics even though most IntrusiveRefCntPtrs
>>> are used in single-threaded contexts. Another is weak_ptr adding a lot
>>> of complexity to the implementation, IntrusiveRefCntPtr doesn't
>>> support weak references.
>>>
>>> With that it's hard to make a case for changing uses of
>>> IntrusiveRefCntPtr as it's a non-trivial amount of work
>>> (IntrusiveRefCntPtr binds the reference count to the object itself,
>>> shared_ptr doesn't. Figuring out when a value held by an
>>> IntrusiveRefCntPtr is passed around by raw pointer and stuffed into
>>> another IntrusiveRefCntPtr is hard) with potential negative
>>> performance impact.
>>>
>>
>> In terms of performance, the whole concept has a number of disavantages
>> :-)
>>
>> I recently tried an experiment. I compiled a 40000 line C file
>> (concatenated all the files of a project together) to .bc with clang, and
>> then ran llc on it. I tried it on both Ubuntu 16.04 x64 and on an Odroid
>> XU-4 ARM board. with very similar results.
>>
>> I made a tiny library with a 1 GB static char array. I made a malloc()
>> that simply bumped a pointer (prepending a 32 bit object size, just for
>> realloc(), grrrrrr kill it with fire), and a free() that is an empty
>> function. There's a calloc() that calls the above malloc() and then
>> memset(). And a realloc() that is a no-op if the size is smaller, or does
>> malloc(), memcpy() if bigger.
>>
>> Then I used LD_PRELOAD to replace the standard malloc library with mine.
>>
>> Result: ~10% faster execution than llc without LD_PRELOAD, and ~180 MB of
>> the array used (120 MB on the 32 bit ARM).
>>
>> Then I built BDW GC as a malloc replacement (with free() as a no-op) and
>> used LD_PRELOAD with it.
>>
>> Result: ~20% faster execution than llc without LD_PRELOAD, and ~10 MB of
>> RAM used.
>>
>> In this experiment all the reference counting in IntrusiveRefCntPtr
>> or shared_ptr or whatever still takes place, the same as before. But at the
>> end, when it decides to call free, it's a no-op. So all the
>> reference-counting machinery is a complete waste of time and code and RAM
>> and the program would run strictly faster if it was ripped out.
>>
>>
>> I may miss something in your description, but it seems like you’re never
>> releasing memory? I’m not sure I follow how is it a good thing?
>>
>
> I did two different tests.
>
> In the first test I never released memory. The compiler allocated 120 -
> 180 MB of total memory compiling a 40000 line C file. Typical C files are
> much smaller that this, so it's potentially a valid strategy if you make a
> new invocation of the compile for every C file. However, it was mostly just
> for statistics-gathering purposes.
>
> In the second test I used a GC. I never released memory, but it was
> collected when objects were no longer reachable.
>
>
> OK I see.
> How do you explain that the GC allocation provides a 10% speedup over the
> simple “bump ptr allocator” (if I understand your results correctly).
>
> —
> Mehdi
>
>
>
>
>
>> Also what about destructor?
>>
>
> Stack-based objects would still have destructors called, heap based
> objects will not. As 99% of destructors only deal with releasing other
> memory owned by the object anyway, this is not important.
>
> Some destructors may be closing files or something like that. I didn't
> notice problems. The compiler ran fine in both cases, and produced asm
> output identical to running it normally.
>
> This is just an experiment. Obviously, if someone were to decide to
> replace explicit memory management with GC in the llvm project then some
> real work would be required to audit the code and find any issues.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/00fb518f/attachment.html>