<div dir="ltr">Hi Hal, Mehdi,<div><br></div><div>Hal: I checked and as I suspected, SCCP, instcombine, simplifycfg and bdce/adce all preserve GlobalsAA.</div><div><br></div><div>Mehdi: I have numbers for you :)</div><div><br></div><div>Running test-suite -flto on an AArch64 out-of-order platform:</div><div> <a href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.188=3" style="color:rgb(0,136,204);text-decoration:none;font-size:12px;line-height:20px">lnt.MultiSource/Benchmarks/FreeBench/analyzer/analyzer</a> 2.57%</div><div><a href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.151=3" style="color:rgb(0,136,204);text-decoration:none;font-size:12px;line-height:20px">lnt.MultiSource/Benchmarks/McCat/17-bintr/bintr</a> -5.34%<br></div><div><a href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.72=3" style="color:rgb(0,136,204);text-decoration:none;font-size:12px;line-height:20px">lnt.MultiSource/Benchmarks/Ptrdist/bc/bc</a> -1.30%<br></div><div><br></div><div>So not much change (certainly less change than I expected), but overall positive.</div><div><br></div><div>On a third party benchmark I get improvements ranging from 1%-11%, and regressions from 1% to 3% (with more improvements than regressions). I also happen to know that when a patch under review goes in, an edge case in this suite gets triggered and one testcase doubles in performance.</div><div><br></div><div>I ran compile time numbers too. My test was codegenning/linking llvm-tblgen using -flto on a macbook pro:</div><div> with patch: 34.64s, 33.62s, 33.89s, 33.33s, 33.8s - median: 33.80s</div><div> without patch: 34.26s, 34.49s, 33.54s, 31.89s, 32.57s - median: 33.54s</div><div><br></div><div>Difference in medians: 0.78% (the samples are so close it probably needs more samples to be properly statistically relevant though!)</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, 11 Dec 2015 at 18:05 Mehdi Amini <<a href="mailto:mehdi.amini@apple.com">mehdi.amini@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><blockquote type="cite"><div>On Dec 11, 2015, at 8:19 AM, James Molloy <<a href="mailto:james@jamesmolloy.co.uk" target="_blank">james@jamesmolloy.co.uk</a>> wrote:</div><br><div><div dir="ltr">Hi,<div><br></div><div>> <span style="line-height:1.5">- I'd rather see this as two patches: one for the GlobalOpt and the other for the scalar optimizations</span></div><br><div>Sure, that's easily done. Would you prefer me to open another phab review or are happy with it being committed split apart?</div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>It is more about the commit. So that the performance can be assessed separately and any issue would be better bisected.</div></div></div><div style="word-wrap:break-word"><div><br><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>> <span style="line-height:1.5">- Do you have benchmark results before/after?</span></div><br><div>Yes and no. The mem2reg changes do affect benchmarks I care about, but they're not in test-suite and I'm not allowed to quote numbers from them.</div></div></div></blockquote><blockquote type="cite"><div><div dir="ltr"><div>I don't have an LTO setup of the test-suite to get numbers for the LTO portions either (although I do have LTO set up for third party test suites that I can't quote numbers from!). I haven't seen any regressions in any test, and some improve drastically. Sorry for the weasel words.</div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>Can you at least give an overview (without naming), like “on some internal benchmarks it improves XX% on average, with XX test cases that regressed around ~XX%” ?</div></div></div><div style="word-wrap:break-word"><div><div><br><blockquote type="cite"><div dir="ltr"></div></blockquote></div><br><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>As a general principle, I think the LTO driver isn't currently doing enough scalar optimization. I've seen several cases where really poor code gets through to late passes like CGP purely because SimplifyCFG/InstCombine weren't run enough.</div></div></div></blockquote><div><br></div><div><br></div></div></div><div style="word-wrap:break-word"><div><div>Clearly, the problem is the tradeoff with the compile time.</div></div></div><div style="word-wrap:break-word"><div><br><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>> - See also: <a href="http://reviews.llvm.org/D13443" rel="noreferrer" target="_blank">http://reviews.llvm.org/D13443</a> ; I paused my work on this till January because of the ThinLTO bringup, but I still plan to move forward with it.<br></div><div><br></div><div>This looks good. It looks like a real reegineering of the pipeline, which is a bit more work than I was hoping to chew off - I hope that my work might go some way towards improving the LTO codegen without requiring thousands of benchmarking hours to check it's OK!</div></div></div></blockquote><div><br></div><div><br></div></div></div><div style="word-wrap:break-word"><div><div>Indeed I spent some hundred of hours of benchmarking in September. I’d be happy if you could test D13443 on your hardware/bench by the way :)</div></div></div><div style="word-wrap:break-word"><div><div><br></div><br><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>(Aside, in D13443 you don't run GlobalOpt/Mem2Reg early. I think functionattrs+globalopt+mem2reg needs to run as early as possible so that demoted globals become first class SSA values for the whole of the pass pipeline).</div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>Note that global opt needs *also* to run after the inliner because it can do more work. But again compile time...</div><div><br></div><div>— </div></div></div><div style="word-wrap:break-word"><div><div>Mehdi</div></div></div><div style="word-wrap:break-word"><div><div><br></div><div><br></div><div><br></div><br><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>James</div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, 11 Dec 2015 at 16:08 Mehdi AMINI via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">joker.eph added a comment.<br>
<br>
Hi James,<br>
<br>
A few points:<br>
<br>
- I'd rather see this as two patches: one for the GlobalOpt and the other for the scalar optimizations<br>
- Do you have benchmark results before/after?<br>
- See also: <a href="http://reviews.llvm.org/D13443" rel="noreferrer" target="_blank">http://reviews.llvm.org/D13443</a> ; I paused my work on this till January because of the ThinLTO bringup, but I still plan to move forward with it.<br>
<br>
Thanks!<br>
<br>
<br>
Repository:<br>
rL LLVM<br>
<br>
<a href="http://reviews.llvm.org/D15449" rel="noreferrer" target="_blank">http://reviews.llvm.org/D15449</a><br>
<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
</blockquote></div>
</div></blockquote></div></div></blockquote></div>