<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">Even though this question does not only concern LLVM, it seems that only compilers guru can answer it. So I am giving a try here, hoping for the best.</div><div class=""><br class=""></div><div class="">In a recent talk by Chandler Carruth, “Performance with algorithms, efficiency with data structures” ( <a href="https://www.youtube.com/watch?v=fHNmRkzxHWs" class="">https://www.youtube.com/watch?v=fHNmRkzxHWs</a> ), Chandler says that one should never return by reference for efficiency. Although I totally agree with him in most cases because pointers make it difficult for a compiler to optimise, I still don’t always have an efficient solution with value semantics. Here is the case that I am thinking of :</div><div class=""><br class=""></div><div class="">====</div><div class="">std::vector<double> f(std::size_t i);</div><div class=""><br class=""></div><div class="">auto v = std::vector<double>( n );</div><div class="">for (std::size_t i = 0; i < 1000; ++i) {</div><div class=""> auto w = f(i);</div><div class=""> for (std::size_t k = 0; k < v.size(); ++k) {</div><div class=""> v[k] += w[k];</div><div class=""> }</div><div class="">}</div><div class="">
====</div><div class=""><br class=""></div><div class="">which would be way slower than</div><div class=""><br class=""></div><div class="">====</div><div class=""><div class="">void f(std::size_t i, std::vector<double>& w);</div><div class=""><br class=""></div><div class="">auto v = std::vector<double>( n );</div><div class="">auto w = std::vector<double>( n );</div><div class="">for (std::size_t i = 0; i < 1000; ++i) {</div><div class=""> f(i, w);</div><div class=""> for (std::size_t k = 0; k < v.size(); ++k) {</div><div class=""> v[k] += w[k];</div><div class=""> }</div><div class="">}</div></div><div class="">====</div><div class=""><br class=""></div><div class="">because there is no memory allocation in the i-loop, inside the f-call. In the Q&A where a guy seems to give him such an example (at 1:06:46), he says that smart compilers such as LLVM can deduplicate memory allocation. It does not seem to me to be applicable to this kind of algorithm. Does anyone have a concrete example where a compiler deduplicates memory allocation?</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">François</div></body></html>