[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Kristof Beyls via llvm-dev llvm-dev at lists.llvm.org
Wed May 24 06:00:39 PDT 2017


On 23 May 2017, at 21:48, Quentin Colombet <qcolombet at apple.com<mailto:qcolombet at apple.com>> wrote:

Great!
I thought I had to look at our pipeline at O0 to make sure optimized regalloc was supported (https://bugs.llvm.org/show_bug.cgi?id=33022 in mind). Glad I was wrong, it saves me some time.

On May 22, 2017, at 12:51 AM, Kristof Beyls <kristof.beyls at arm.com<mailto:kristof.beyls at arm.com>> wrote:


On 22 May 2017, at 09:09, Diana Picus <diana.picus at linaro.org<mailto:diana.picus at linaro.org>> wrote:

Hi Quentin,

I actually did a run with -mllvm -optimize-regalloc -mllvm
-regalloc=greedy over the weekend and the test does pass with that.
Haven't measured the compile time though.

Cheers,
Diana

I also did my usual benchmarking run with the same options as Diana did above:
- Comparing against -O0 without globalisel: 2.5% performance drop, 0.8% code size improvement.

That’s compared to 9.5% performance drop and 2.8% code size regression, without that regalloc scheme, right?

Indeed.


- Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop.

In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.

Do you have a mean to make this measurement?
Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :).

I did a quick setup with CTMark (part of the test-suite). I ran each of
* '-O0 -g',
* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and
* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy'
5 times, cross-compiling from X86 to AArch64, and took the median measured compile times.
In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default.
With enabling the greedy register allocator, this increases to 28%.
28% is probably too high? At the moment I can't think of an alternative to having a "constant materialization localizer" pass at -O0 to hit all the metrics we thought of as necessary before enabling GISel by default.

It would be good if someone else could also do a compilation time experiment - just to make sure I didn't make any silly mistakes in my experiment.

Here are the details I see:

        gisel   gisel+greedy
CTMark/7zip/7zip-benchmark      102.8%  106.5%
CTMark/Bullet/bullet    100.5%  105.1%
CTMark/ClamAV/clamscan  101.6%  130.8%
CTMark/SPASS/SPASS      101.2%  120.0%
CTMark/consumer-typeset/consumer-typeset        105.7%  138.2%
CTMark/kimwitu++/kc     103.1%  122.6%
CTMark/lencod/lencod    106.2%  143.4%
CTMark/mafft/pairlocalalign     96.2%   135.4%
CTMark/sqlite3/sqlite3  109.1%  155.1%
CTMark/tramp3d-v4/tramp3d-v4    109.1%  132.0%
GEOMEAN 103.5%  128.0%


Thanks,

Kristof
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170524/e87c5df6/attachment.html>


More information about the llvm-dev mailing list