[PATCH] D15449: [PassManagerBuilder] Add a few more scalar optimization passes
Mehdi Amini via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 15 01:11:21 PST 2015
Hi,
Thanks for all these nice results!
That looks good to me.
—
Mehdi
> On Dec 14, 2015, at 8:54 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
>
> Hi Hal, Mehdi,
>
> Hal: I checked and as I suspected, SCCP, instcombine, simplifycfg and bdce/adce all preserve GlobalsAA.
>
> Mehdi: I have numbers for you :)
>
> Running test-suite -flto on an AArch64 out-of-order platform:
> lnt.MultiSource/Benchmarks/FreeBench/analyzer/analyzer <http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.188=3> 2.57%
> lnt.MultiSource/Benchmarks/McCat/17-bintr/bintr <http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.151=3> -5.34%
> lnt.MultiSource/Benchmarks/Ptrdist/bc/bc <http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3534/graph?test.72=3> -1.30%
>
> So not much change (certainly less change than I expected), but overall positive.
>
> On a third party benchmark I get improvements ranging from 1%-11%, and regressions from 1% to 3% (with more improvements than regressions). I also happen to know that when a patch under review goes in, an edge case in this suite gets triggered and one testcase doubles in performance.
>
> I ran compile time numbers too. My test was codegenning/linking llvm-tblgen using -flto on a macbook pro:
> with patch: 34.64s, 33.62s, 33.89s, 33.33s, 33.8s - median: 33.80s
> without patch: 34.26s, 34.49s, 33.54s, 31.89s, 32.57s - median: 33.54s
>
> Difference in medians: 0.78% (the samples are so close it probably needs more samples to be properly statistically relevant though!)
>
> Cheers,
>
> James
>
> On Fri, 11 Dec 2015 at 18:05 Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>> On Dec 11, 2015, at 8:19 AM, James Molloy <james at jamesmolloy.co.uk <mailto:james at jamesmolloy.co.uk>> wrote:
>>
>> Hi,
>>
>> > - I'd rather see this as two patches: one for the GlobalOpt and the other for the scalar optimizations
>>
>> Sure, that's easily done. Would you prefer me to open another phab review or are happy with it being committed split apart?
>
> It is more about the commit. So that the performance can be assessed separately and any issue would be better bisected.
>
>>
>> > - Do you have benchmark results before/after?
>>
>> Yes and no. The mem2reg changes do affect benchmarks I care about, but they're not in test-suite and I'm not allowed to quote numbers from them.
>> I don't have an LTO setup of the test-suite to get numbers for the LTO portions either (although I do have LTO set up for third party test suites that I can't quote numbers from!). I haven't seen any regressions in any test, and some improve drastically. Sorry for the weasel words.
>
> Can you at least give an overview (without naming), like “on some internal benchmarks it improves XX% on average, with XX test cases that regressed around ~XX%” ?
>
>
>>
>> As a general principle, I think the LTO driver isn't currently doing enough scalar optimization. I've seen several cases where really poor code gets through to late passes like CGP purely because SimplifyCFG/InstCombine weren't run enough.
>
>
> Clearly, the problem is the tradeoff with the compile time.
>
>>
>> > - See also: http://reviews.llvm.org/D13443 <http://reviews.llvm.org/D13443> ; I paused my work on this till January because of the ThinLTO bringup, but I still plan to move forward with it.
>>
>> This looks good. It looks like a real reegineering of the pipeline, which is a bit more work than I was hoping to chew off - I hope that my work might go some way towards improving the LTO codegen without requiring thousands of benchmarking hours to check it's OK!
>
>
> Indeed I spent some hundred of hours of benchmarking in September. I’d be happy if you could test D13443 on your hardware/bench by the way :)
>
>
>>
>> (Aside, in D13443 you don't run GlobalOpt/Mem2Reg early. I think functionattrs+globalopt+mem2reg needs to run as early as possible so that demoted globals become first class SSA values for the whole of the pass pipeline).
>
> Note that global opt needs *also* to run after the inliner because it can do more work. But again compile time...
>
> —
> Mehdi
>
>
>
>
>>
>> James
>>
>> On Fri, 11 Dec 2015 at 16:08 Mehdi AMINI via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>> joker.eph added a comment.
>>
>> Hi James,
>>
>> A few points:
>>
>> - I'd rather see this as two patches: one for the GlobalOpt and the other for the scalar optimizations
>> - Do you have benchmark results before/after?
>> - See also: http://reviews.llvm.org/D13443 <http://reviews.llvm.org/D13443> ; I paused my work on this till January because of the ThinLTO bringup, but I still plan to move forward with it.
>>
>> Thanks!
>>
>>
>> Repository:
>> rL LLVM
>>
>> http://reviews.llvm.org/D15449 <http://reviews.llvm.org/D15449>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151215/62d7762b/attachment.html>
More information about the llvm-commits
mailing list