[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha ahmed.bougacha at gmail.com
Wed Feb 25 16:57:37 PST 2015


Hi all,

I've started looking at the GlobalMerge pass, enabled by default on
ARM and AArch64.  I think we should reconsider that, at least for
AArch64.

As is, the pass just merges all globals together, in groups of 4KB
(AArch64, 128B on ARM).

At the time it was enabled, the general thinking was "it's almost
free, it doesn't affect performance much, we might as well use it".
Now, it's preventing some link-time optimizations (as acknowledged in
one of the FIXMEs).


-- Performance impact
Overall, it isn't that profitable on the test-suite, and actually
degrades performance on a lot of other - "non-benchmark" - projects I
tried (where the main reason to use a global is file- or function-
static variables, only accessed through a single getter function).

Across several runs on the entire test-suite, when disabling the pass,
I measured:
without LTO, a -0.19% geomean improvement
with LTO, a +0.11% geomean regression.

As for just SPEC2006, there are two big regressions: 400.perlbench
(10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).

Numbers are attached.


-- A way forward
One obvious way to improve it is: look at uses of globals, and try to
form sets of globals commonly used together.  The tricky part is to
define heuristics for "commonly".  Also, the pass then becomes much
more expensive.  I'm currently looking into improving it, and will
report if I come up with a good solution.  But this shouldn't stop us
from disabling it, for now.

Also, the pass seems like a good candidate for
-O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
which IMO shouldn't include these not-always-profitable optimizations.
That's another problem though.



Right now, I think we should disable the pass by default, until it's
deemed profitable enough.

-Ahmed



More information about the llvm-dev mailing list