[PATCH] D15449: [PassManagerBuilder] Add a few more scalar optimization passes

Tue Dec 15 01:11:21 PST 2015

Hi,

Thanks for all these nice results!

That looks good to me.

— 
Mehdi

> On Dec 14, 2015, at 8:54 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
> 
> Hi Hal, Mehdi,
> 
> Hal: I checked and as I suspected, SCCP, instcombine, simplifycfg and bdce/adce all preserve GlobalsAA.
> 
> Mehdi: I have numbers for you :)
> 
> Running test-suite -flto on an AArch64 out-of-order platform:
>   lnt.MultiSource/Benchmarks/FreeBench/analyzer/analyzer <http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.188=3> 2.57%
> lnt.MultiSource/Benchmarks/McCat/17-bintr/bintr <http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.151=3> -5.34%
> lnt.MultiSource/Benchmarks/Ptrdist/bc/bc <http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.72=3> -1.30%
> 
> So not much change (certainly less change than I expected), but overall positive.
> 
> On a third party benchmark I get improvements ranging from 1%-11%, and regressions from 1% to 3% (with more improvements than regressions). I also happen to know that when a patch under review goes in, an edge case in this suite gets triggered and one testcase doubles in performance.
> 
> I ran compile time numbers too. My test was codegenning/linking llvm-tblgen using -flto on a macbook pro:
>   with patch: 34.64s, 33.62s, 33.89s, 33.33s, 33.8s -  median: 33.80s
>   without patch: 34.26s, 34.49s, 33.54s, 31.89s, 32.57s - median: 33.54s
> 
> Difference in medians: 0.78% (the samples are so close it probably needs more samples to be properly statistically relevant though!)
> 
> Cheers,
> 
> James
> 
> On Fri, 11 Dec 2015 at 18:05 Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>> On Dec 11, 2015, at 8:19 AM, James Molloy <james at jamesmolloy.co.uk <mailto:james at jamesmolloy.co.uk>> wrote:
>> 
>> Hi,
>> 
>> > - I'd rather see this as two patches: one for the GlobalOpt and the other for the scalar optimizations
>> 
>> Sure, that's easily done. Would you prefer me to open another phab review or are happy with it being committed split apart?
> 
> It is more about the commit. So that the performance can be assessed separately and any issue would be better bisected.
> 
>> 
>> > - Do you have benchmark results before/after?
>> 
>> Yes and no. The mem2reg changes do affect benchmarks I care about, but they're not in test-suite and I'm not allowed to quote numbers from them.
>> I don't have an LTO setup of the test-suite to get numbers for the LTO portions either (although I do have LTO set up for third party test suites that I can't quote numbers from!). I haven't seen any regressions in any test, and some improve drastically. Sorry for the weasel words.
> 
> Can you at least give an overview (without naming), like “on some internal benchmarks it improves XX% on average, with XX test cases that regressed around ~XX%” ?
> 
> 
>> 
>> As a general principle, I think the LTO driver isn't currently doing enough scalar optimization. I've seen several cases where really poor code gets through to late passes like CGP purely because SimplifyCFG/InstCombine weren't run enough.
> 
> 
> Clearly, the problem is the tradeoff with the compile time.
> 
>> 
>> > - See also: http://reviews.llvm.org/D13443 <http://reviews.llvm.org/D13443> ; I paused my work on this till January because of the ThinLTO bringup, but I still plan to move forward with it.
>> 
>> This looks good. It looks like a real reegineering of the pipeline, which is a bit more work than I was hoping to chew off - I hope that my work might go some way towards improving the LTO codegen without requiring thousands of benchmarking hours to check it's OK!
> 
> 
> Indeed I spent some hundred of hours of benchmarking in September. I’d be happy if you could test D13443 on your hardware/bench by the way :)
> 
> 
>> 
>> (Aside, in D13443 you don't run GlobalOpt/Mem2Reg early. I think functionattrs+globalopt+mem2reg needs to run as early as possible so that demoted globals become first class SSA values for the whole of the pass pipeline).
> 
> Note that global opt needs *also* to run after the inliner because it can do more work. But again compile time...
> 
> — 
> Mehdi
> 
> 
> 
> 
>> 
>> James
>> 
>> On Fri, 11 Dec 2015 at 16:08 Mehdi AMINI via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>> joker.eph added a comment.
>> 
>> Hi James,
>> 
>> A few points:
>> 
>> - I'd rather see this as two patches: one for the GlobalOpt and the other for the scalar optimizations
>> - Do you have benchmark results before/after?
>> - See also: http://reviews.llvm.org/D13443 <http://reviews.llvm.org/D13443> ; I paused my work on this till January because of the ThinLTO bringup, but I still plan to move forward with it.
>> 
>> Thanks!
>> 
>> 
>> Repository:
>>   rL LLVM
>> 
>> http://reviews.llvm.org/D15449 <http://reviews.llvm.org/D15449>
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151215/62d7762b/attachment.html>