<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">OK: because the GC was freeing memory to be reused, makes sense.<div class=""><br class=""></div><div class="">Thanks!</div><div class=""><br class=""></div><div class="">— </div><div class="">Mehdi</div><div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On Oct 19, 2016, at 12:21 PM, Bruce Hoult <<a href="mailto:bruce@hoult.org" class="">bruce@hoult.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Locality of reference (largely fitting into L3 cache), and not having to produce a large number of demand-zero CoW VM pages from the OS.</div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Wed, Oct 19, 2016 at 10:19 PM, Mehdi Amini via llvm-dev <span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><br class=""><div class=""><div class=""><div class="h5"><blockquote type="cite" class=""><div class="">On Oct 19, 2016, at 12:16 PM, Bruce Hoult <<a href="mailto:bruce@hoult.org" target="_blank" class="">bruce@hoult.org</a>> wrote:</div><br class="m_-6438330443652757763Apple-interchange-newline"><div class=""><div dir="ltr" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><div class="gmail_extra"><div class="gmail_quote">On Wed, Oct 19, 2016 at 9:31 PM, Mehdi Amini via llvm-dev<span class="m_-6438330443652757763Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.<wbr class="">org</a>></span><span class="m_-6438330443652757763Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><br class=""><div class=""><span class="m_-6438330443652757763gmail-"><blockquote type="cite" class=""><div class="">On Oct 19, 2016, at 11:14 AM, Bruce Hoult via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_-6438330443652757763gmail-m_5485103407779497724Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote">On Wed, Oct 19, 2016 at 6:24 PM, Benjamin Kramer via llvm-dev<span class="m_-6438330443652757763Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.<wbr class="">org</a>></span><span class="m_-6438330443652757763Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">In terms of performance shared_ptr has a number of disadvantages. One<br class="">is that it always uses atomics even though most IntrusiveRefCntPtrs<br class="">are used in single-threaded contexts. Another is weak_ptr adding a lot<br class="">of complexity to the implementation, IntrusiveRefCntPtr doesn't<br class="">support weak references.<br class=""><br class="">With that it's hard to make a case for changing uses of<br class="">IntrusiveRefCntPtr as it's a non-trivial amount of work<br class="">(IntrusiveRefCntPtr binds the reference count to the object itself,<br class="">shared_ptr doesn't. Figuring out when a value held by an<br class="">IntrusiveRefCntPtr is passed around by raw pointer and stuffed into<br class="">another IntrusiveRefCntPtr is hard) with potential negative<br class="">performance impact.<br class=""><div class="m_-6438330443652757763gmail-m_5485103407779497724gmail-HOEnZb"><div class="m_-6438330443652757763gmail-m_5485103407779497724gmail-h5"><span style="color:rgb(34,34,34)" class=""></span></div></div></blockquote><div class=""> </div><div class="">In terms of performance, the whole concept has a number of disavantages :-)</div><div class=""><br class=""></div><div class="">I recently tried an experiment. I compiled a 40000 line C file (concatenated all the files of a project together) to .bc with clang, and then ran llc on it. I tried it on both Ubuntu 16.04 x64 and on an Odroid XU-4 ARM board. with very similar results.</div><div class=""><br class=""></div><div class="">I made a tiny library with a 1 GB static char array. I made a malloc() that simply bumped a pointer (prepending a 32 bit object size, just for realloc(), grrrrrr kill it with fire), and a free() that is an empty function. There's a calloc() that calls the above malloc() and then memset(). And a realloc() that is a no-op if the size is smaller, or does malloc(), memcpy() if bigger.</div><div class=""><br class=""></div><div class="">Then I used LD_PRELOAD to replace the standard malloc library with mine.</div><div class=""><br class=""></div><div class="">Result: ~10% faster execution than llc without LD_PRELOAD, and ~180 MB of the array used (120 MB on the 32 bit ARM).</div><div class=""><br class=""></div><div class="">Then I built BDW GC as a malloc replacement (with free() as a no-op) and used LD_PRELOAD with it.</div><div class=""><br class=""></div><div class="">Result: ~20% faster execution than llc without LD_PRELOAD, and ~10 MB of RAM used.</div><div class=""><br class=""></div><div class="">In this experiment all the reference counting in IntrusiveRefCntPtr or shared_ptr or whatever still takes place, the same as before. But at the end, when it decides to call free, it's a no-op. So all the reference-counting machinery is a complete waste of time and code and RAM and the program would run strictly faster if it was ripped out.</div></div></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">I may miss something in your description, but it seems like you’re never releasing memory? I’m not sure I follow how is it a good thing? </div><div class=""></div></div></div></blockquote><div class=""><br class="m_-6438330443652757763gmail-Apple-interchange-newline">I did two different tests.</div><div class=""><br class=""></div><div class="">In the first test I never released memory. The compiler allocated 120 - 180 MB of total memory compiling a 40000 line C file. Typical C files are much smaller that this, so it's potentially a valid strategy if you make a new invocation of the compile for every C file. However, it was mostly just for statistics-gathering purposes.</div><div class=""><br class=""></div><div class="">In the second test I used a GC. I never released memory, but it was collected when objects were no longer reachable.</div></div></div></div></div></blockquote><div class=""><br class=""></div></div></div><div class="">OK I see.</div><div class="">How do you explain that the GC allocation provides a 10% speedup over the simple “bump ptr allocator” (if I understand your results correctly).</div><div class=""><br class=""></div><div class="">— </div><div class="">Mehdi</div><span class=""><div class=""><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class="">Also what about destructor?</div><div class=""></div></div></div></blockquote><div class=""> <br class=""></div><div class="">Stack-based objects would still have destructors called, heap based objects will not. As 99% of destructors only deal with releasing other memory owned by the object anyway, this is not important.</div><div class=""><br class=""></div><div class="">Some destructors may be closing files or something like that. I didn't notice problems. The compiler ran fine in both cases, and produced asm output identical to running it normally.</div><div class=""><br class=""></div><div class="">This is just an experiment. Obviously, if someone were to decide to replace explicit memory management with GC in the llvm project then some real work would be required to audit the code and find any issues.</div></div></div></div></div></blockquote></span></div><br class=""></div><br class="">______________________________<wbr class="">_________________<br class="">
LLVM Developers mailing list<br class="">
<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class="">
<br class=""></blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></body></html>