[llvm-dev] RFC: Switching to the new pass manager by default

Mon Oct 30 08:25:13 PDT 2017

Hi,

I compared IRs and see what the issue is. I have not yet found which pass is responsible for this.
The benchmark source: http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.c?view=markup
The function of interest is ‘value’. There is a loop nest with small loops containing an if-statement. I see that the internal loops are fully unrolled. With the new pass manager, most of cmp+br instructions are replaced with select instructions. This results huge basic blocks (400+ instructions). So we create a lot of instructions which are now executed unconditionally. According to the execution profile of the “good” program a lot of braches are not taken. The “good” program has 1.01 Bn instructions executed. The “bad” program has 1.62 Bn instructions executed. Another problem is that a lot of the created instructions are FP instructions: SCVTF/FADD/FCVTZS. According to the Cortex-A57 optimization guide they are quite heavy:

SCVTF: latency 10, throughput 1
FADD: latency 5, throughput 2
FCVTZS: latency 10, throughput 1

I think now it is clear why the regression is so huge.

As soon as I identify the pass responsible for this I’ll create a bug report. Feel free to create it yourself if you know the pass. Just give a ticket number and I’ll add my results of analysis.

Thanks,
Evgeny Astigeevich

From: Sanjay Patel <spatel at rotateright.com>
Date: Wednesday, 25 October 2017 at 19:17
To: Hal Finkel <hfinkel at anl.gov>
Cc: Evgeny Astigeevich <Evgeny.Astigeevich at arm.com>, Chandler Carruth <chandlerc at gmail.com>, llvm-dev <llvm-dev at lists.llvm.org>, nd <nd at arm.com>
Subject: Re: [llvm-dev] RFC: Switching to the new pass manager by default

The new PM always runs -latesimplifycfg rather than -simplifycfg. Test to show it:
https://reviews.llvm.org/rL316351

Given that these patches cited SPEC and test-suite perf while changing the behavior of simplifycfg:
https://reviews.llvm.org/D30333
https://reviews.llvm.org/D35411

...the differences mentioned here might be related?

Kindly request review so we can close this hole. :)
https://reviews.llvm.org/D38631

On Wed, Oct 25, 2017 at 11:38 AM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

On 10/25/2017 12:32 PM, Evgeny Astigeevich wrote:
Hi Hal,

I quickly checked the execution profile. It is real. The code changed significantly. A number of the hottest regions changed. I’ll compare IRs.

Thanks. Obviously a 1000% execution performance regression seems problematic.

 -Hal

JFYI FreeBench/fourinarow  time graph: http://lnt.llvm.org/db_default/v4/nts/graph?highlight_run=76922&plot.1604615=1349.1604615.3
Its graph in our LNT is more stable.

Thanks,
Evgeny

From: Hal Finkel <hfinkel at anl.gov><mailto:hfinkel at anl.gov>
Organization: Argonne National Laboratory
Date: Wednesday, 25 October 2017 at 18:14
To: Evgeny Astigeevich <Evgeny.Astigeevich at arm.com><mailto:Evgeny.Astigeevich at arm.com>, Chandler Carruth <chandlerc at gmail.com><mailto:chandlerc at gmail.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org><mailto:llvm-dev at lists.llvm.org>, nd <nd at arm.com><mailto:nd at arm.com>
Subject: Re: [llvm-dev] RFC: Switching to the new pass manager by default

On 10/25/2017 12:10 PM, Evgeny Astigeevich via llvm-dev wrote:
Hi Chandler,

I ran the LNT benchmarks and SPEC2k6.train on AArch64 Cortex-A57. I used revisions: Clang 316561, LLVM 316563.
Options: -O3 -mcpu=cortex-a57 -fomit-frame-pointer -fexperimental-new-pass-manager

Regressions: execution time increase

LNT
MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow                              1018.58%

How real is this?

 -Hal

MultiSource/Benchmarks/Fhourstones/fhourstones                                                 9.06%
MultiSource/Benchmarks/Ptrdist/yacr2/yacr2                                                           7.23%
MultiSource/Benchmarks/Olden/perimeter/perimeter                                           6.87%
MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset        6.02%
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1                                             5.59%
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk                                     5.03%

SPEC2k6
453.povray           17.11%
482.sphinx3          3.44%
444.namd             2.89%

Improvements: execution time decrease

LNT
MultiSource/Benchmarks/BitBench/uudecode/uudecode                      -50.90%
SingleSource/Benchmarks/Adobe-C++/loop_unroll                                   -27.75%
SingleSource/Benchmarks/Misc/perlin                                                         -21.35%
MultiSource/Benchmarks/Olden/em3d/em3d                                           -19.12%
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4                                 -8.58%
SingleSource/Benchmarks/McGill/chomp                                                    -6.33%
MultiSource/Benchmarks/sim/sim                                                                 -5.41%
MultiSource/Applications/ClamAV/clamscan                                              -3.11%
MultiSource/Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl                 -2.81%

SPEC2k6
429.mcf                 -5.18%
473.astar              -2.65%
400.perlbench     -1.90%

There are also code sizes increases/decreases. The maximum increase is 18.98%. The maximum decrease is 25.65%.

Thanks,
Evgeny Astigeevich

From: llvm-dev <llvm-dev-bounces at lists.llvm.org><mailto:llvm-dev-bounces at lists.llvm.org> on behalf of Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org><mailto:llvm-dev at lists.llvm.org>
Reply-To: Chandler Carruth <chandlerc at gmail.com><mailto:chandlerc at gmail.com>
Date: Wednesday, 18 October 2017 at 07:51
To: llvm-dev <llvm-dev at lists.llvm.org><mailto:llvm-dev at lists.llvm.org>
Subject: [llvm-dev] RFC: Switching to the new pass manager by default

Greetings everyone!

The new pass manager is getting extremely close to the point where I'm not aware of any significant outstanding work needed, and I'd like to see what else would be needed to enable it by default. Here are the current functionality I'm aware of outstanding:

1) Does not do non-trivial loop unswitching. Majority of this is in https://reviews.llvm.org/D34200 but will need one or two small follow-ups.

2) Currently, sanitizers don't work correctly with it. Thanks to the work of others, the missing infrastructure has been added and I'll send a patch to wire this up this week.

3) Missing support for 'optnone'. I've been working on this, but the existing testing wasn't as thorough as I wanted, so it is going slowly. I've got about 1/4 of this implemented and should have patches this week or next.

4) Missing opt-bisect (or similar) facility. This looks pretty trivial to add, but I've not even started. If anyone is interested in it, go for it. We might even be able to do something simpler using the generic debug counters and get equivalent functionality.

... that's it?

Optimization quality / run-time performance:
- We've been using it at Google extensively and are very happy with the optimization quality. Benchmarks look *very* good here.
- More data from other users would be important.
- You can try it out with `-fexperimental-new-pass-manager` to Clang

Compile-time performance:
- Sometimes *much* better due to cached analyses.
- Sometimes worse, typically due to more / different inlining in turn running main pipeline (GVN + InstCombine) more times or over more code.
- Overall somewhat a wash, but the increased compile times typically due to the optimizer "trying" harder, so not too concerning on our end.
- Again, more feedback from other users good: `-fexperimental-new-pass-manager` to Clang

Once the four missing things land, I'll also happily work on collecting some of the basics on the test-suite and CTMark. But I suspect more "in the wild" data would really be useful here given the significance of the change.

Thoughts? What else (beyond the four items above and feedback on run-time and compile-time) would folks like to see?

Once this happens, I'll also be preparing some batch, mechanical updates to the test suite to primarily use the new pass manager. Also there is lots of documentation updates that will be needed here.

-Chandler

PS: I'll be sending a note to cfe-dev as a "heads up" about this discussion as in some ways, the default flip is mostly a Clang default flip. But hopefully our doc updates will trigger this being "perceived" as the default for other frontends, and I'll try to reach out to other major frontends as well (Swift and Rust are on my radar, and I've already started talking with Philip Reames about their Falcon JIT).

_______________________________________________

LLVM Developers mailing list

llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>

http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory

--

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171030/60d47cfa/attachment.html>