<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Nov 28, 2016, at 6:04 PM, Michael Spencer <<a href="mailto:bigcheesegs@gmail.com" class="">bigcheesegs@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote">On Tue, Nov 29, 2016 at 10:18 AM, Mehdi AMINI via Phabricator <span dir="ltr" class=""><<a href="mailto:reviews@reviews.llvm.org" target="_blank" class="">reviews@reviews.llvm.org</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">mehdi_amini added a comment.<br class="">

<br class="">

What is the motivation? This is removing a feature. Even though it is not used at that time, it is tested and could find its use in the future.<br class="">

<br class="">

<br class="">

<a href="https://reviews.llvm.org/D27159" rel="noreferrer" target="_blank" class="">https://reviews.llvm.org/<wbr class="">D27159</a><br class="">

<br class="">

<br class="">

<br class="">

</blockquote></div></div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">shared_future has a large performance overhead compared to just handing off a function pointer to another thread to run. </div></div></div></blockquote><br class=""></div><div>That’s a good point, but “large overhead” is quite subjective. Note also that the LLVM ThreadPool has a global lock, so it is not intended for a lot of very small tasks (<100ms).</div><div>My first prototype was way more complex, and was wrapping around libdispatch when available or using C++11 construct otherwise. Much better for a lot of small tasks!</div><div>However the complexity was not worth it for my use case at the time: ThinLTO tasks range  between 100ms and a few seconds, so a few ms overhead don’t matter.</div><div><br class=""></div><div>To focus on the submission part, I queued an empty task multiple times in a threapool with 0 threads, never deleted/synchronized, and measuring just the queuing time.</div><div class=""><br class=""></div><div class="">For 1000000 queuing, it went from 180ms to 500ms, which account on average to a queuing time for one task going from 180ns to 500ns, so the overhead of shared_future is 320ns per task.</div><div class=""><br class=""></div><div class="">As matter of comparison, to evaluate the overhead of the rest of the thread pool infrastructure, I reran the same experiment but this time with one thread in the pool processing these empty tasks. It took over 7s (so over 20 times the std::future overhead). Also, adding threads increases the contention and the performance drops (8.5s with 2 threads, and 11s with 4threads).</div><div class=""><br class=""></div><div class="">— </div><div class="">Mehdi</div><div class=""><br class=""></div></body></html>