<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">Thanks for all these nice results!</div><div class=""><br class=""></div><div class="">That looks good to me.</div><div class=""><br class=""></div><div class="">— </div><div class="">Mehdi</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 14, 2015, at 8:54 AM, James Molloy <<a href="mailto:james@jamesmolloy.co.uk" class="">james@jamesmolloy.co.uk</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi Hal, Mehdi,<div class=""><br class=""></div><div class="">Hal: I checked and as I suspected, SCCP, instcombine, simplifycfg and bdce/adce all preserve GlobalsAA.</div><div class=""><br class=""></div><div class="">Mehdi: I have numbers for you :)</div><div class=""><br class=""></div><div class="">Running test-suite -flto on an AArch64 out-of-order platform:</div><div class="">  <a href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.188=3" style="color:rgb(0,136,204);text-decoration:none;font-size:12px;line-height:20px" class="">lnt.MultiSource/Benchmarks/FreeBench/analyzer/analyzer</a> 2.57%</div><div class=""><a href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.151=3" style="color:rgb(0,136,204);text-decoration:none;font-size:12px;line-height:20px" class="">lnt.MultiSource/Benchmarks/McCat/17-bintr/bintr</a> -5.34%<br class=""></div><div class=""><a href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.72=3" style="color:rgb(0,136,204);text-decoration:none;font-size:12px;line-height:20px" class="">lnt.MultiSource/Benchmarks/Ptrdist/bc/bc</a> -1.30%<br class=""></div><div class=""><br class=""></div><div class="">So not much change (certainly less change than I expected), but overall positive.</div><div class=""><br class=""></div><div class="">On a third party benchmark I get improvements ranging from 1%-11%, and regressions from 1% to 3% (with more improvements than regressions). I also happen to know that when a patch under review goes in, an edge case in this suite gets triggered and one testcase doubles in performance.</div><div class=""><br class=""></div><div class="">I ran compile time numbers too. My test was codegenning/linking llvm-tblgen using -flto on a macbook pro:</div><div class="">  with patch: 34.64s, 33.62s, 33.89s, 33.33s, 33.8s -  median: 33.80s</div><div class="">  without patch: 34.26s, 34.49s, 33.54s, 31.89s, 32.57s - median: 33.54s</div><div class=""><br class=""></div><div class="">Difference in medians: 0.78% (the samples are so close it probably needs more samples to be properly statistically relevant though!)</div><div class=""><br class=""></div><div class="">Cheers,</div><div class=""><br class=""></div><div class="">James</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Fri, 11 Dec 2015 at 18:05 Mehdi Amini <<a href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><div class="">On Dec 11, 2015, at 8:19 AM, James Molloy <<a href="mailto:james@jamesmolloy.co.uk" target="_blank" class="">james@jamesmolloy.co.uk</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class="">Hi,<div class=""><br class=""></div><div class="">> <span style="line-height:1.5" class="">- I'd rather see this as two patches: one for the GlobalOpt and the other for the scalar optimizations</span></div><br class=""><div class="">Sure, that's easily done. Would you prefer me to open another phab review or are happy with it being committed split apart?</div></div></div></blockquote><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class="">It is more about the commit. So that the performance can be assessed separately and any issue would be better bisected.</div></div></div><div style="word-wrap:break-word" class=""><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">> <span style="line-height:1.5" class="">- Do you have benchmark results before/after?</span></div><br class=""><div class="">Yes and no. The mem2reg changes do affect benchmarks I care about, but they're not in test-suite and I'm not allowed to quote numbers from them.</div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">I don't have an LTO setup of the test-suite to get numbers for the LTO portions either (although I do have LTO set up for third party test suites that I can't quote numbers from!). I haven't seen any regressions in any test, and some improve drastically. Sorry for the weasel words.</div></div></div></blockquote><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class="">Can you at least give an overview (without naming), like “on some internal benchmarks it improves XX% on average, with XX test cases that regressed around ~XX%” ?</div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""><blockquote type="cite" class=""><div dir="ltr" class=""></div></blockquote></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">As a general principle, I think the LTO driver isn't currently doing enough scalar optimization. I've seen several cases where really poor code gets through to late passes like CGP purely because SimplifyCFG/InstCombine weren't run enough.</div></div></div></blockquote><div class=""><br class=""></div><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class="">Clearly, the problem is the tradeoff with the compile time.</div></div></div><div style="word-wrap:break-word" class=""><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">> - See also: <a href="http://reviews.llvm.org/D13443" rel="noreferrer" target="_blank" class="">http://reviews.llvm.org/D13443</a> ; I paused my work on this till January because of the ThinLTO bringup, but I still plan to move forward with it.<br class=""></div><div class=""><br class=""></div><div class="">This looks good. It looks like a real reegineering of the pipeline, which is a bit more work than I was hoping to chew off - I hope that my work might go some way towards improving the LTO codegen without requiring thousands of benchmarking hours to check it's OK!</div></div></div></blockquote><div class=""><br class=""></div><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class="">Indeed I spent some hundred of hours of benchmarking in September. I’d be happy if you could test D13443 on your hardware/bench by the way :)</div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">(Aside, in D13443 you don't run GlobalOpt/Mem2Reg early. I think functionattrs+globalopt+mem2reg needs to run as early as possible so that demoted globals become first class SSA values for the whole of the pass pipeline).</div></div></div></blockquote><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class="">Note that global opt needs *also* to run after the inliner because it can do more work. But again compile time...</div><div class=""><br class=""></div><div class="">— </div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class="">Mehdi</div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">James</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Fri, 11 Dec 2015 at 16:08 Mehdi AMINI via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">joker.eph added a comment.<br class="">

<br class="">

Hi James,<br class="">

<br class="">

A few points:<br class="">

<br class="">

- I'd rather see this as two patches: one for the GlobalOpt and the other for the scalar optimizations<br class="">

- Do you have benchmark results before/after?<br class="">

- See also: <a href="http://reviews.llvm.org/D13443" rel="noreferrer" target="_blank" class="">http://reviews.llvm.org/D13443</a> ; I paused my work on this till January because of the ThinLTO bringup, but I still plan to move forward with it.<br class="">

<br class="">

Thanks!<br class="">

<br class="">

<br class="">

Repository:<br class="">

  rL LLVM<br class="">

<br class="">

<a href="http://reviews.llvm.org/D15449" rel="noreferrer" target="_blank" class="">http://reviews.llvm.org/D15449</a><br class="">

<br class="">

<br class="">

<br class="">

_______________________________________________<br class="">

llvm-commits mailing list<br class="">

<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a><br class="">

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br class="">

</blockquote></div>

</div></blockquote></div></div></blockquote></div>

</div></blockquote></div><br class=""></div></body></html>