[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?

Fri Feb 27 12:42:14 PST 2015

Hi Kristof,

Our tests are on iOS, which definitely uses the LOH optimizations for ARM64.

-Jim

> On Feb 26, 2015, at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> wrote:
> 
> Hi Ahmed,
> 
> Did you run these experiments on a platform with a linker that makes
> use of the AArch64CollectLOH-pass-produced information?
> I'm guessing that the AArch64CollectLOH-pass information and a linker
> that makes use of that information could affect the profitability of
> the GlobalMerge pass?
> 
> Thanks,
> 
> Kristof
> 
>> -----Original Message-----
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Ahmed Bougacha
>> Sent: 26 February 2015 01:13
>> To: LLVM Dev
>> Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
>> 
>> With the numbers!
>> -Ahmed
>> 
>> 
>> On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
>> <ahmed.bougacha at gmail.com> wrote:
>>> Hi all,
>>> 
>>> I've started looking at the GlobalMerge pass, enabled by default on
>>> ARM and AArch64.  I think we should reconsider that, at least for
>>> AArch64.
>>> 
>>> As is, the pass just merges all globals together, in groups of 4KB
>>> (AArch64, 128B on ARM).
>>> 
>>> At the time it was enabled, the general thinking was "it's almost
>>> free, it doesn't affect performance much, we might as well use it".
>>> Now, it's preventing some link-time optimizations (as acknowledged in
>>> one of the FIXMEs).
>>> 
>>> 
>>> -- Performance impact
>>> Overall, it isn't that profitable on the test-suite, and actually
>>> degrades performance on a lot of other - "non-benchmark" - projects I
>>> tried (where the main reason to use a global is file- or function-
>>> static variables, only accessed through a single getter function).
>>> 
>>> Across several runs on the entire test-suite, when disabling the pass,
>>> I measured:
>>> without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean
>>> regression.
>>> 
>>> As for just SPEC2006, there are two big regressions: 400.perlbench
>>> (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
>>> 
>>> Numbers are attached.
>>> 
>>> 
>>> -- A way forward
>>> One obvious way to improve it is: look at uses of globals, and try to
>>> form sets of globals commonly used together.  The tricky part is to
>>> define heuristics for "commonly".  Also, the pass then becomes much
>>> more expensive.  I'm currently looking into improving it, and will
>>> report if I come up with a good solution.  But this shouldn't stop us
>>> from disabling it, for now.
>>> 
>>> Also, the pass seems like a good candidate for
>>> -O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
>>> which IMO shouldn't include these not-always-profitable optimizations.
>>> That's another problem though.
>>> 
>>> 
>>> 
>>> Right now, I think we should disable the pass by default, until it's
>>> deemed profitable enough.
>>> 
>>> -Ahmed
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev